Help navigating through and selecting information from an XML document

straffin · October 1, 2015, 8:59pm

I hate XML. I really do.

I’ve looked through the other XML discussions here and, though they’ve gotten me far, I just can’t seem to grok how to get from where they’ve gotten me to where I need to be. Here’s a sanitized approximation of my XML:

<?xml version="1.0" encoding="UTF-8"?>
<ServerSettings DomainId="123">
<CommConf>
<AgentCommunicationSetting AlwaysConnect="1"/>
<ServerList Name="Server List">
<ServerPriorityBlock Name="Priority1">
<Server Address="1.2.3.4"/>
<Server Address="server1.servers.com"/>
</ServerPriorityBlock>
<ServerPriorityBlock Name="Priority2">
<Server Address="2.3.4.5"/>
<Server Address="server2.servers.com"/>
</ServerPriorityBlock>
</ServerList>
</CommConf>
</ServerSettings>

Ideally, I’m trying to pull out the FQDN of the Server in the ServerPriorityBlock named “Priority1”. Alternatively, I’ll take a sorted, comma-separeted list of all of the “Address” values in that ServerPriorityBlock. Here’s what I’ve managed so far:

Q: node values of attributes whose (node name of it = "Address") of child nodes of selects "ServerSettings/CommConf/ServerList/ServerPriorityBlock" of xml document of file "C:\data.xml" 
A: 1.2.3.4
A: server1.servers.com
A: 2.3.4.5
A: server2.servers.com

I’m trying to get this to just be “server1.servers.com” or, barring that, “1.2.3.4, server1.servers.com”. Any ideas? (Also, any good documentation for traversing and extracting data from XML documents with BigFix Relevance that I’m just not finding with my Google-fu?)

JasonWalker · October 2, 2015, 1:48am

One method would be to group by ServerPriorityBlock, iterating over each block of ‘child nodes of it’; then concatenating together the attributes -

Q: (concatenation ", " of node values of attributes whose (node name of it = "Address") of child nodes of it) of selects "ServerSettings/CommConf/ServerList/ServerPriorityBlock" of xml document of file "C:\temp\test.xml" 
A: 1.2.3.4, server1.servers.com
A: 2.3.4.5, server2.servers.com
T: 0.411 ms
I: plural string

Another method would be to take all of the Address nodes together, and remove any that match an IP address format using a regular expression:

Q: node values whose (not exists matches (regex("^[0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}$")) of it) of attributes whose (node name of it = "Address") of child nodes of selects "ServerSettings/CommConf/ServerList/ServerPriorityBlock" of xml document of file "C:\temp\test.xml" 
A: server1.servers.com
A: server2.servers.com
T: 0.617 ms
I: plural string

jgstew · October 2, 2015, 3:14am

I love XML, and in particular XPATH. I would recommend INI files for very simple data, but I like XML for pretty much anything otherwise.

Don’t google for how to do this with BigFix… that will lead you down the wrong path. You need to figure out how to do this with XPATH, and then use the XPATH statement you come up with in the xpaths or selects inspectors.

I haven’t test this at all, but this should be very close:

node values whose(it as string as trimmed string as lowercase ends with ".com") of selects "ServerSettings/CommConf/ServerList/ServerPriorityBlock[@Name='Priority1']/Server/@Address" of xml document of file "C:\data.xml"

This is a great site to test XPATH: http://codebeautify.org/Xpath-Tester

straffin · October 2, 2015, 1:17pm

Thanks, Jason! I was primarily stuck on how to just get the “Priority1” information, but there’s lots of great sample code in your answer that I’m going to have to remember (or, at least, remember that it’s here and look for it again later).

straffin · October 2, 2015, 1:21pm

Thanks, James! It works great with a little tweaking to match my actual data (and getting rid of the “, it” at the beginning due to a type mismatch). I had assumed that, as with much of how BigFix does things in relevance language, there would be no external sources of information that applied. Thanks for the XPATH info … that URL looks awesome (and so much better to test against than Q&A)!!

JasonWalker · October 2, 2015, 2:22pm

If you want to easily return to a post here later, you can Like or Bookmark it. Then when you select your avatar in the upper-right, you can see your own “Bookmarks” and “Likes Given”.

Another great resource for sample code is http://bigfix.me, if you haven’t come across that site yet.

straffin · October 2, 2015, 2:57pm

Thanks, Jason. I do indeed know about http://bigfix.me and have been using BigFix for the better part of a decade. Unfortunately, the things I usually end up looking for use terms so generic that half of the web comes up in the results.

jgstew · October 2, 2015, 4:48pm

I typically search in google using the following to find examples on bigfix.me :

whatever I am searching about inurl:bigfix.me

jgstew · October 2, 2015, 4:51pm

In the case of the WMI inspectors in BigFix, figure out how to do it with WMI online or with a WMI Explorer GUI, then apply that to the inspector.

In the case of the SQLite inspectors, figure out how to do it with SQLite, then use that in the inspector.

In the case of the XPATH inspectors, figure out how to do it with XPATH in general, then go from there with the inspector.

There are many cases where you should figure out how to do something in the general case, and then apply that to the inspectors. I’m sure there are other examples. All of the inspectors listed above are using an existing API or library to actually do the query, so it isn’t the inspector that is the hard part, it is figuring out the query that would be required even if BigFix wasn’t being used at all.

straffin · October 9, 2015, 2:09pm

Argh … so NOW the problem is that I also need to do this same action on both Windows & Mac OS X and the XML inspectors are Windows-only. :-/ Off to mess with “following text of first blah blah blah” again…

straffin · October 9, 2015, 2:44pm

This works:

concatenation " " of ((substrings between "%22" of following text of first "Priority1" of preceding text of first "</ServerPriorityBlock>" of lines of file "C:\data.xml") whose (it as string as trimmed string as lowercase ends with ".com"))

jgstew · October 10, 2015, 6:43am

I’m honestly not sure why that works unless the file is all 1 line. I would expect that relevance to break.

The lack of XML inspectors on OS X has been a pet peeve of mine for a very long time.

JasonWalker · October 10, 2015, 7:58pm

I’m really coming to love Regexen, seem to solve so many complex cases. Try this on:

//Match the strings between the quotes inside a "Server Address" node
    q: parenthesized parts 2 of matches(regex("(Server Address=[%22])([^%22]*)")) of lines of file "c:\temp\test.xml"
    A: 1.2.3.4
    A: server1.servers.com
    A: 2.3.4.5
    A: server2.servers.com
    T: 0.442 ms
    I: plural substring
    
// Remove the strings that are an IP address
    q: it whose ((not exists matches (regex("^[0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}$")) of it)) of parenthesized parts 2 of matches(regex("(Server Address=[%22])([^%22]*)")) of lines of file "c:\temp\test.xml"
    A: server1.servers.com
    A: server2.servers.com
    T: 0.695 ms
    I: plural substring

jgstew · October 11, 2015, 12:52am

RegEx is very powerful, but I would recommend using it sparingly and only where required. Other text parsing options are more human readable and deterministic.

RegEx is a bit like WMI or SHA1… it is possible to create a query that given the right input will take so long to execute that it will cause problems.

Use the following sites to help write RegEx

Once you have a RegEx expression working, it is fairly straight forward to get it to work in relevance.