Multiple regex matches and XML parsing

Hello!

Can you please help me understand why

following texts of firsts ("ID") of 
(
unique values of locked lines WHOSE (it contains (regex "([Ii][Dd](\W|\w|\S|\s|)=)")) of files ((pathnames of files of folders of folder "\\network\c$\Directory\" as string) whose (it contains (regex ".exe.config$")))
)

fails with:
the property 'firsts <regular expression> of <string with multiplicity>' is not defined

when I try using regex to match the “firsts” (on a Windows machine) with:
following texts of firsts (regex "ID") of ?

Sample of results using the successfully executed statement:

=“1245810244”>
=“34” touchscreen=“false”/>
=“384145601” boothNumber=“11” window=“31” Code=“” area=“” section=“” location=“” touchScreen=“False” secureWorkstation=“false” cardReaderEnabled=“false”>
=“38” touchscreen=“false” />

What I’m really after is the value associated with the "([Ii][Dd](\W|\w|\S|\s|)=)" regex though.

Edit: Adding complete lines found (%09 = Tab)

%09%09%09<App tipOfTheDayPath=“TipOfTheDay.xml” workstationID=“1245810244”><display taskbar=“True”><position width=“1920” maximized=“True” height=“1009” locationX=“1912” locationY=“-8” /><homepage floorStatsCollapse=“False” revenueStatusCollapse=“False” auditStatusCollapse=“False” rvSlotsCollapse=“False” rvSlotTypesCollapse=“False” rvMultiGamesCollapse=“False” rvLinksCollapse=“False” rvSlotGroupsCollapse=“False” tipOfDayCollapse=“False” /></display></App>

%09%09%09<App locationID=“34” touchscreen=“false”/>

%09%09%09<App workstationID=“384145601” boothNumber=“11” window=“31” casinoCode=“” area=“” section=“” location=“” touchScreen=“False” secureWorkstation=“false” cardReaderEnabled=“false”>

%09%09%09<App locationId=“-1” touchscreen=“false” mmtDebug=“false” primaryPit=“” secondaryPit=“” window=“-1”/>

%09%09%09<App locationId=“-1” touchscreen=“false” mmtDebug=“false” primaryPit=“” secondaryPit=“” window=“-1”/>

 <App locationID="38" touchscreen="false" /\>

I’m having difficulty understanding what you are trying to do, but the first problem is you cannot take a regex from a plural, you need to refactor that first one as

( following texts of firsts ("ID") of it )

When wrapped in parentheses this way the regex operates on each string one at a time, instead of failing to apply a regex to the plural…I think.

In any case for what you’re doing I think the XML inspectors would probably suit better.

Thanks Jason.

That didn’t do the trick (same error), but I’ll begin researching the XML inspectors.

Ultimately I’m attempting to get the value set for any variable ending with “id” (i.e., 1245810244; 34; 384145601; and 38 in the examples above) in these .config files. Haven’t bothered working on the “of preceding __” part of the query since this is where I’m getting stopped.

Not finding the info about xml inspectors on developer.bigfix.com very helpful and have no previous context for working with xml (also found no forum posts addressing issues similar to mine)… could you please say more about the limitations of using regex on plurals, and potential workarounds?

(Edit: Found a post where you provided some xml related links Relevance Challenge December 2019 BONUS: Parsing Paragraphs (answer provided) - #10 by JasonWalker - Thanks!)

Can you provide an example of your XML…maybe as a new forum topic rather than a tag on to an old posting? Perhaps the community can help with some XML parsing methods

Try this 5 minute video primer on XML inspectors

4 Likes

Thanks! I keep forgetting about that channel… This is exactly what I was hoping to come across!

One point you make merits a follow-up on my part though; you mention there being tutorials for how to use inspectors at https://developer.bigfix.com but I’ve only ever been able to find very high level examples of how to reference an inspector property (via the Inspector Search), and never examples of how to apply them:

What am I missing in trying to find those tutorials?

Thanks anyway for the reply and recommendation!

Well, that’s embarassing.

I have very clear memories of going through the tutorials, including XML, as part of a relevance guide, when I started as a BigFix customer somewhere around 2010.

I don’t see any of that in the tutorials at developer.bigfix.com now though. When I went through this, there were a set of PDF guides, like those in the Wiki at Welcome to Wikis ; but the versions on this page I couldn’t find much example on XML either. These versions are either older, or newer, than what I learned on and it looks like XML is not in the guide.

I’m sure we can find some examples here in the forum though, the trick is maybe finding something that’s not too complex to use as an example.

I suppose I should say in general you should go through the at https://developer.bigfix.com/relevance/guide/ and Tutorials at https://developer.bigfix.com/relevance/tutorial.html before getting too bogged down in the Reference and Search…it’s just that in this specific case for XML there actually don’t appear to be any tutorials left there.

They are definitely worth the read. I read through them again when my current employer took my advice and invested in BigFix (I hadn’t done anything with it in nearly five years) but find they don’t provide enough context for me to be able to do much with what’s presented. Which is why I appreciate this community so much. As I search for topics/posts relevant to what I’m up to I often learn about things I have no exposure to and frequently happen upon some real gems :smiley:

On that note; is there really no way you can think of to overcome the limitation of using regex with plural results?

This lesson on xml has been valuable to be sure, but it’s a new arena for me and without a lot of trial and error I get the sense I’ll be hard pressed to capture the details of all potentially applicable nodes in the list of application configs I’m trying to interrogate with these analyses…

Thanks as ever!

1 Like

This is definitely a new journey into XML and xpath for me. So assuming with a root XML node, your XML is something like below, it appears to be parsable via relevance and an xpath inspector to get the text for each attribute that contains “ID”.

Your XML (assumption from your first edited post)

<xml>
<App tipOfTheDayPath="TipOfTheDay.xml" workstationID="1245810244">
	<display taskbar="True">
		<position width="1920" maximized="True" height="1009" locationX="1912" locationY="-8"/>
		<homepage floorStatsCollapse="False" revenueStatusCollapse="False" auditStatusCollapse="False" rvSlotsCollapse="False" rvSlotTypesCollapse="False" rvMultiGamesCollapse="False" rvLinksCollapse="False" rvSlotGroupsCollapse="False" tipOfDayCollapse="False"/>
	</display>
</App>
<App locationID="34" touchscreen="false"/>
<App workstationID="384145601" boothNumber="11" window="31" casinoCode="" area="" section="" location="" touchScreen="False" secureWorkstation="false" cardReaderEnabled="false"/>
<App locationId="-1" touchscreen="false" mmtDebug="false" primaryPit="" secondaryPit="" window="-1"/>
<App locationId="-1" touchscreen="false" mmtDebug="false" primaryPit="" secondaryPit="" window="-1"/>
<App locationID="38" touchscreen="false"/>
</xml>

After some Googling for possible xpath syntaxes I found https://stackoverflow.com/questions/47650002/find-xpath-attribute-name-contains-specific-string which, when implemented as a relevance xpath inspection gives us

Q: (xpaths "/xml/App/@*[contains(local-name(), 'ID')]" of it as text) of (xml document of file "D:\temp\something.xml")
A: 1245810244
A: 34
A: 384145601
A: 38
T: 0.943 ms
I: plural string

I think the trick is to figure out the right xpath expressions to perform the searches you need then once those are figured out, the relevance part would fall into place

4 Likes

I’ve moved this to a new topic because I was confusing the original topic with the ongoing discussion.

1 Like

Getting back to your first question

I had some difficulty trying to piece out what you were doing…you had a quoted piece of relevance

following texts of firsts ("ID") of 
(
unique values of locked lines WHOSE (it contains (regex "([Ii][Dd](\W|\w|\S|\s|)=)")) of files ((pathnames of files of folders of folder "\\network\c$\Directory\" as string) whose (it contains (regex ".exe.config$")))
)

which works for me when I built a test case for it:

following texts of firsts ("ID") of 
(
unique values of locked lines WHOSE (it contains (regex "([Ii][Dd](\W|\w|\S|\s|)=)")) of find files "*.xml" of folders "c:\temp\multifile"
)

but then I see further in your post that this is not the statement that is giving the error you describe, but some variation of it where you’re trying to use a regex for ID…but that exact statement isn’t actually posted.

Please, for everyone watching, you’ll get the best answers when you can create the simplest possible statement that demonstrates the problem, along with a simple set of test data, and post those exactly here.

What I think might be trying (for my test dataset) is

following texts of firsts (regex "ID") of 
(
unique values of locked lines WHOSE (it contains (regex "([Ii][Dd](\W|\w|\S|\s|)=)")) of find files "*.xml" of folders "c:\temp\multifile"
)

At least, this throws the same error you describe.
This has two problems:

  • A regex is not a string, it doesn’t have following texts. What you need are the following texts of the matches of the regex.
  • A regex doesn’t apply to a plural group of strings, you have to apply the regex to each string individually, by wrapping with (matches (regex(<expression>)) of it)
following texts of (first matches (regex "ID") of it) of 
(
unique values of locked lines WHOSE (it contains (regex "([Ii][Dd](\W|\w|\S|\s|)=)")) of find files "*.xml" of folders "c:\temp\multifile"
)

or

following texts of firsts (matches (regex "ID") of it) of 
(
unique values of locked lines WHOSE (it contains (regex "([Ii][Dd](\W|\w|\S|\s|)=)")) of find files "*.xml" of folders "c:\temp\multifile"
)

There’s a nuance between the two, that I’m not sure whether it matters in your test data.

The (first matches (regex "ID") of it) will only match the regular expression once per line, so only one ‘ID’ field on each line will end up in the result.

The (matches (regex "ID") of it) would match every time the ID field appears on a line, so one line might produce two (or more) matches, and each of those matches has a ‘following text’ so you’ll end up with multiple results per line.

Hope this helps!

1 Like

Thank you! I’ll be using this in my research of how to use the xml inspector for sure :+1:

1 Like

The original question could have been phrased better, you’re right.

THANK you; regex is SUCH an awesome method to have integrated with BF, I love using it, but obviously I still have a ways to go in fully comprehending its implementations. I had inserted “match” but not “matches” in my testing, and probably not in the right place - your second statement here is exactly what I needed but the explanation will undoubtedly serve me even better - thanks again!

1 Like