Parsing text ... again

(imported topic written by jeko1791)

Hi,

I’ve done some text file parsing before, but am struggling with pulling a piece of information out of the middle of a line. Here’s the first few lines of the file (from a Solaris “prtdiag -v”):

System Configuration: Sun Microsystems sun4u Sun Fire 280R (2 X UltraSPARC-III+)

System clock frequency: 150 MHz

Memory size: 2048 Megabytes

I am trying to pull out just the “Sun Microsystems” and just the “Sun Fire 280R”, from the first line, but I cannot rely on the “sun4u” being constant (it could be something else, like “i86pc” or “sun4us”) and I cannot rely on the “(2 X UltraSPARC-III+)” being there either. I thought about hoping a number of spaces from a specific spot, but didn’t know how to count thru them in relevance.

Any ideas on how to grab this text out of this line?

Thanks

(imported comment written by SystemAdmin)

If you think this is a very typical example, I think the following might work for you, assuming that the first 41 characters are standard, the ‘sun4u’ entries do not have spaces and the trailing information is in parentheses. I would normally do a bit of background work* to see whether the “Sun Workstations” entry is typical (for your machines), the two spaces in the middle of the string are always present, etc.

( IF it contains " (" THEN preceding text of first " (" of it ELSE it ) of following text of first " " of following text of position 41 of line whose (it starts with “System Configuration:”) of file “prtdiag_output”

So, assuming the first 41 chars are standard, pull the specific line, take the trailing string after the known length of the beginning, then extract the following text after the first space found, with the final option to lop off any data from the end if there is a " (" in the string.

*A thought springs to mind when I read questions like this…do I actually have enough information to make a proper decision? The answer here is probably not. My concern is not about the data you’ve given for one machine, but for all the other data you’re going to get back from the other machines.

Normally I would create an analysis with relevance to pull the data I’m going to query. I examine all the unique examples and try to see where the similar instances of spacing, phrases, special characters, etc. might help me parse the line reliably. Sometimes you just have to grab the best set of data you can and just use it.

-Jim

(imported comment written by jeko1791)

Thanks Jim.

Reading your post, a lightbulb went off in my head. I don’t know what the make and model will always be, or even if the parenthesized piece will always be there, but I do know the limited variations of the architecture string, so I can use this as a key to guide my parsing. I believe this will work very well to determine the Make (before the architecture key) and Model (after the architecture key). Even if it’s not 100%, I can monitor the built-in “OS Architecture - Unix” property to see if any other variations pop-in and update my properties accordingly.

Manufacturer

q: preceding texts of firsts (“sun4u”;“sun4us”;“i86pc”;“sun4v”) of following texts of first “System Configuration:” of lines of file “prtdiag.txt” as trimmed string

A: Sun Microsystems

Model

q: following texts of firsts (“sun4u”;“sun4us”;“i86pc”;“sun4v”) of lines of file “prtdiag.txt” as trimmed string

A: Sun Fire 280R (2 X UltraSPARC-III+)

Model if you want to exclude the stuff in the parentheses

q: preceding texts of last (if (it contains “(”) then ("(") else (" ")) of following texts of firsts (“sun4u”;“sun4us”;“i86pc”;“sun4v”) of lines of file “prtdiag.txt” as trimmed string

A: Sun Fire 280R

Now I just need to find the command that outputs similar info for AIX and HP-UX :slight_smile:

(imported comment written by jessewk)

As your parsing requirements get more complicated, you will probably want to start using the regex inspectors instead of the string manipulation inspectors

(imported comment written by jeko1791)

Agreed. I’ll admit, I’m weak with regular expressions, but need to get better. They are very powerful.

Thanks.

(imported comment written by BenKus)

This will inspire you to learn regular expressions: http://xkcd.com/208/

Ben

(imported comment written by SystemAdmin)

Ben,

Sorry to dredge up an old post, but I’d noticed this (and other) posts about regexes before and it made me think that perhaps there needs to be more simple instances where the regex command is used and helps to make it sink in a bit easier for people. (Perhaps Bigfix should actually use regexes a bit more, as a good example.)

Below I’ve reproduced a common line that is used all the time (and just cries out for refactoring), which is the test for OS.

I’ve listed four examples:

  • the first is the usual multi-OR’d version

  • the next is a simpler form that I’ve tended to use for a while

  • the third is a simple equivalent regex form

  • the last is a slightly better regex, as it elimintes the redundant ‘Win’ text in all the comparisons. Not a huge ‘win’ in this case (haha, ‘win’, get it!?!..{cough}…sorry) but in other instances could help to the length of your comparison quite a bit.

Luckily the speed difference isn’t an issue, but generall couple of simple OR’d statments will be quicker.

Q: ((name of it = “WinXP”) OR (name of it = “WinXP-2003”) OR (name of it = “Win2003”) OR (name of it = “WinVista”) OR (name of it = “Win2008”) OR (name of it = “Win7”) OR (name of it = “Win2008R2”)) of operating system

Q: “WinXP|WinXP-2003|Win2003|WinVista|Win7|Win2008|Win2008R2” contains name of operating system

Q: name of operating system = regex “^(WinXP|WinXP-2003|Win2003|WinVista|Win7|Win2008|Win2008R2)$”

Q: name of operating system = regex “^(Win(XP|XP-2003|2003|Vista|7|2008|2008R2))$”

-Jim

(imported comment written by NoahSalzman)

Nice Jim… you get a gold .* for that example!

{cough}