Formatting lines extracted from an XML file on a Linux machine

Hello. I’m attempting to extract employee information from an XML file residing on a Linux machine. I’m able to gather the data, but now I want to comma separate the output and add a carriage return/line feed after a specific data element.

Here is what I have so far from the file, using the fixlet debugger:
Q: (preceding texts of matches (regexes “/FirstName|/LastName|/JobCode|/HomePhone”) of (following texts of matches (regexes “FirstName|LastName|JobCode|HomePhone”) of lines of file “/tmp/employees.xml”))
A: DOE
A: JOHN
A: 1234567890
A: 1234
A: DOE
A: JANE
A: 9876543210
A: 4321

I’d want to get the output to look like this:
A: DOE, JOHN, 1234567890, 1234
A: DOE, JANE, 9876543210, 4321

Is there a way to comma separate these values and then add a carriage return/line feed after that 4th value? Thanks in advance.

Try this:

concatenation "," of (preceding texts of matches (regexes "/FirstName|/LastName|/JobCode|/HomePhone") of (following texts of matches (regexes "FirstName|LastName|JobCode|HomePhone") of lines of file "/tmp/employees.xml"))

Haven’t tested it but I think it’ll work.

Yes. Thank you so much, this helps. I get the following output now, which is much better: :smile:
A: DOE,JOHN,1234567890,1234,DOE,JANE,9876543210,4321

Now if I can get a carriage return/line feed before each employee, so the output looks like this:
A: DOE,JOHN,1234567890,1234
A: DOE,JANE,9876543210,4321

I’m not sure if this is even possible, but if any ideas come to mind, it’s greatly appreciated. Thanks again for the input.

Maybe this:

concatenation ", " of (preceding texts of matches (regexes "/FirstName|/LastName|/JobCode|/HomePhone") & "%0d%0a" of (following texts of matches (regexes "FirstName|LastName|JobCode|HomePhone") of lines of file "/tmp/employees.xml"))

EDIT: I think the cr should be in a different place.

Once you’ve figured out how/where to place them the cr and lf can be inserted using the hex codes “%0D%0A”.

Thanks. I’ll give this a try. Much appreciated!

 This seemed to work on Windows:
    
(concatenation "," of (parenthesized part 1 of it; parenthesized part 2 of it; parenthesized part 3 of it; parenthesized part 4 of it) of it) of matches (case insensitive regex "<LastName>([^<]*)</LastName>.*<FirstName>([^<]*)</FirstName>.*<HomePhone>([^<]*)</HomePhone>.*<JobCode>([^<]*)</JobCode>.*") of concatenation of lines of file "C:\Temp\employee.xml"

However, when I tried the same on Linux, the last phone number and job code in the file were returned as results for all employees in the file.

rpr,
When I ran that statement I got some odd results… Tracking it down I saw that the answers were coming back from the regex were giving results to the end of the string…

q:  matches (case insensitive regex "<LastName>([^<]*)</LastName>.*<FirstName>([^<]*)</FirstName>.*<HomePhone>([^<]*)</HomePhone>.*<JobCode>([^<]*)</JobCode>.*") of concatenation of lines of file "d:\Temp\employees.xml"

A: <LastName>Employee1</LastName><FirstName>DOE</FirstName><HomePhone>HomePhoneEmployee1</HomePhone><JobCode>JObcode1</JobCode></employee><employee><LastName>Employee2</LastName><FirstName>JANE</FirstName><HomePhone>HomePhoneEmployee2</HomePhone><JobCode>JObcode2</JobCode></employee><employee><LastName>Employee3</LastName><FirstName>Joan</FirstName><HomePhone>HomePhoneEmployee3</HomePhone><JobCode>JObcode3</JobCode></employee><employee><LastName>Employee4</LastName><FirstName>Joan</FirstName><HomePhone>HomePhoneEmployee4</HomePhone><JobCode>JObcode4</JobCode></employee><employee><LastName>Employee5</LastName><FirstName>Joan</FirstName><HomePhone>HomePhoneEmployee5</HomePhone><JobCode>JObcode5</JobCode></employee><employee><LastName>Employee6</LastName><FirstName>Joan</FirstName><HomePhone>HomePhoneEmployee6</HomePhone><JobCode>JObcode6</JobCode></employee></xml>
A: <LastName>Employee2</LastName><FirstName>JANE</FirstName><HomePhone>HomePhoneEmployee2</HomePhone><JobCode>JObcode2</JobCode></employee><employee><LastName>Employee3</LastName><FirstName>Joan</FirstName><HomePhone>HomePhoneEmployee3</HomePhone><JobCode>JObcode3</JobCode></employee><employee><LastName>Employee4</LastName><FirstName>Joan</FirstName><HomePhone>HomePhoneEmployee4</HomePhone><JobCode>JObcode4</JobCode></employee><employee><LastName>Employee5</LastName><FirstName>Joan</FirstName><HomePhone>HomePhoneEmployee5</HomePhone><JobCode>JObcode5</JobCode></employee><employee><LastName>Employee6</LastName><FirstName>Joan</FirstName><HomePhone>HomePhoneEmployee6</HomePhone><JobCode>JObcode6</JobCode></employee></xml>
A: ~ 

So I decided to take it in a slightly different direction
(parenthesized part 1 of it, parenthesized part 2 of it, parenthesized part 3 of it,parenthesized part 4 of it) of (matches (case insensitive regex “<LastName>([^<]*)</LastName><FirstName>([^<]*)</FirstName><HomePhone>([^<]*)</HomePhone><JobCode>([^<]*)</JobCode>” ) of (concatenation of lines of file “d:\Temp\employees.xml”))`

Note the changed regex: "<LastName>([^<]*)</LastName><FirstName>([^<]*)</FirstName><HomePhone>([^<]*)</HomePhone><JobCode>([^<]*)</JobCode>"
Which provided the following answer on windows:

A: Employee1, DOE, HomePhoneEmployee1, JObcode1
A: Employee2, JANE, HomePhoneEmployee2, JObcode2
A: Employee3, Joan, HomePhoneEmployee3, JObcode3
A: Employee4, Joan, HomePhoneEmployee4, JObcode4
A: Employee5, Joan, HomePhoneEmployee5, JObcode5
A: Employee6, Joan, HomePhoneEmployee6, JObcode6

And appears to be coming back the same on ubuntu! :smile:

Let me know if this works for you
-Jgo