Parsing xml document

MattPeterson · March 29, 2017, 7:26pm

Hey,

I’m looking for help parsing a xml document. I’m reading the file on AIX servers so I can’t use the native xml inspectors. Unfortunately they’re only available on Windows, Debian, Red Hat, SUSE and Ubuntu. I have no idea why AIX isn’t supported.

https://developer.bigfix.com/relevance/reference/xml-dom-document.html

Here is a sample of the xml document I’m trying to retrieve info from

    <chknetconn>
            <en4>
                    <PingTest>
                            <status>PASSED</status>
                    </PingTest>
                    <chkLocalHost>
                            <status>PASSED</status>
                    </chkLocalHost>
            </en4>
            <en5>
                    <PingTest>
                            <status>PASSED</status>
                    </PingTest>
                    <chkLocalHost>
                            <status>PASSED</status>
                    </chkLocalHost>
            </en5>
    </chknetconn>

I want to create an analysis that will read the status for ping test and chkloclahost for each adaptor. The adaptor names (i.e. en4) will be different on each computer. I want the properties to report the status for each adaptor as a unique answer, example below:

Property “PingTest”
Result:
en4: PASSED
en5: PASSED

Property “chkLocalHost”
Result:
en4: PASSED
en5: PASSED

I can get the status of the first adaptor using this relevance:

> q: following text of first "<status>" of preceding text of first "</status>" of following text of first "<PingTest>" of preceding text of first "</PingTest>" of following text of first "<en4>" of preceding text of first "</en4>" of concatenation "~~~" of lines of file "/tmp/test.xml"
> A: PASSED

I can read the unique adaptor names using this relevance:

q: (substrings separated by "~~~" whose (it contains regex "<en.*>")  of following text of first "<chknetconn>" of preceding text of first "</chknetconn>" of it) of (concatenation "~~~" of lines of file "/tmp/test.xml") as trimmed string
A: <en4>
A: <en5>

I can’t find a way to use the second statement to filter the results in the first statement.

Aram · March 30, 2017, 2:07pm

Will <PingTest> always be the first node after the name of the adapter? If so, perhaps something like the following might work?

(preceding text of first ">" of following text of first "<" of previous line of it as trimmed string, preceding text of first "</status>" of following text of first "<status>" of next line of it as trimmed string) of lines whose (it as string contains "<PingTest>") of file "/tmp/test.xml"

jgo · March 30, 2017, 2:20pm

Aram,
dude, sweet.
-jgo

MattPeterson · March 30, 2017, 2:36pm

PingTest should always be the first value, but there are other values for each adaptor we want to get info from, like chkLocalHost in the provided example.

Aram · March 30, 2017, 3:12pm

Assuming the order and existence of the fields is consistent, you can still leverage the same approach for chkLocalHost with something like:

(preceding text of first “>” of following text of first “</” of next line of it as trimmed string, preceding text of first “” of following text of first “” of previous line of it as trimmed string) of lines whose (it as string contains “”) of file “R:\temp\test.xml”

MattPeterson · March 30, 2017, 5:27pm

Thanks, that works for this example, but we could have additional fields for each adaptor depending on the type of adaptor. So chkLocalHost will not always be the last item. Here is a example of what the xml could look like on some systems

<chknetconn>
        <en4>
                <PingTest>
                        <status>PASSED</status>
                </PingTest>
                <chkLocalHost>
                        <status>PASSED</status>
                </chkLocalHost>
                <Duplex>
                        <status>PASSED</status>
                </Duplex>
                <intSpeed>
                        <speed>100</speed>
                        <status>PASSED</status>
                </intSpeed>
        </en4>
        <en5>
                <PingTest>
                        <status>PASSED</status>
                </PingTest>
                <chkLocalHost>
                        <status>PASSED</status>
                </chkLocalHost>
        </en5>
</chknetconn>

Marjan · March 30, 2017, 11:45pm

If the file is not too big and if you don’t have other tags that start with “<en” you can try something like:

> ("en" & preceding text of first ">" of it, following text of first "<status>" of preceding text of first "</status>" of it) of substrings separated by "<en" whose (it contains "<status>" ) of concatenation "" of lines of file "/tmp/test.xml"

There are other slightly more complicated ways of doing this for any arbitrary tag but I think this one should work well in a lot of cases.

JasonWalker · March 31, 2017, 3:26am

I feel that this can probably be cleaned up quite a bit but don’t know if I can keep looking at it right now.

There’s probably a very elegant regular expression to handle it. Here’s a not-very-elegant regex to handle interfaces named en1 through en10

q: (item 0 of it, preceding text of first ">" of following text of first "<" of item 1 of it, parenthesized parts 2 of matches(regex("(<status>)(.*)(</status>)")) of item 1 of it) of (following text of first "<" of preceding text of first ">" of it, matches(regexes(("<" & it & ">.*</" & it & ">" )of ("PingTest";"chkLocalHost"))) of it) of ((matches(regexes(("<en[" & it as string & "]>.*</en[" & it as string & "]>") of integers in (1,10))) of it as string)) of (matches(case insensitive regex("<chknetconn>.*</chknetconn>")) of concatenation of lines of file "c:\temp\test.xml")
A: en4, PingTest, PASSED
A: en4, chkLocalHost, PASSED
A: en5, PingTest, PASSED
A: en5, chkLocalHost, PASSED
T: 2.418 ms
I: plural ( substring, substring, substring )

MattPeterson · April 3, 2017, 2:13pm

Nice work guys! Both solutions worked. I ended up using Marjan’s logic, here’s the statement I’m using:

> q: ("en" & preceding text of first ">" of it & ": " & following text of first "<status>" of preceding text of first "</status>" of it of following text of first "<PingTest>" of it) of substrings separated by "<en" whose (it contains "<PingTest>" ) of concatenation "" of lines of file "c:\test\test.xml"
> A: en4: PASSED
> A: en5: PASSED
> T: 1.702 ms
> I: plural string