In general, I recommend narrowing the scope of what lines / strings you are actually parsing with RegEx by using lines containing
and preceding text
or following text
and similar inspectors, as well as Whose filters. Often you don’t end up needing RegEx at all.
There are a few different reasons I recommend this approach over regex alone:
- A regex that needs to handle any input and find just the string pattern you are looking for in particular is going to be much more complicated and error prone.
- Regex can take an indeterminate amount of time to evaluate that varies based upon the input text. It is not possible to predict how long it will take with certainty. This is partially mitigated by having simplified regex that only operates on limited input text.
- It can be hard to tell what a RegEx is supposed to do by just looking at the RegEx itself. It is non-obvious.
- Every RegEx engine can have slightly different quirks, which can be annoying.
- It is possible in some cases to have a RegEx that effectively results in an infinite loop that never resolves, which could have bad consequences. This usually depends on the RegEx engine, so I don’t think it applies to BigFix.
Benefits of RegEx over other methods alone:
- sometimes it is the only option to accomplish the task
- You may have existing RegEx that you are already using in other languages
- Easier to find RegEx examples online.
In the cases where RegEx is the best method, in almost all cases you are better off using a hybrid approach, using a combination of string parsing, filtering, and RegEx.
RegEx is valuable for input validation. It is not hard to parse out words
that are likely email addresses from an arbitrary string, but it is very hard to validate that they are “valid” email addresses without RegEx. This is less needed in cases where you are reasonably sure that the input only contains valid items already.
Example:
Q: substrings separated by " " whose(it contains "@" AND it contains ".") of "ab.c abc def abc@def.com abc@abc@def.com @.com xyz"
A: abc@def.com
A: abc@abc@def.com
A: @.com
T: 2556
Only 1 of the results is actually a valid email address, but many other possibilities have been eliminated, which means that using RegEx for validation only needs to run on a more limited set of possible inputs.
This would get only the 2nd command from a multi command, but without RegEx: (not a complete solution, but a partial example)
Q: (preceding text of first " " of it | it) of following texts of firsts " && " of it whose(it contains " && ") of "28 0 * * * root test -x /etc/cron.daily/popularity-contest && /etc/cron.daily/popularity-contest --crond "
A: /etc/cron.daily/popularity-contest
T: 2484
This would give the 7th word from the string:
Q: tuple string items 6 of concatenations ", " of substrings separated by " " of "28 0 * * * root test -x /etc/cron.daily/popularity-contest && /etc/cron.daily/popularity-contest --crond "
A: test
T: 2457
this combines both, but only works properly for exactly 2 commands:
Q: ( ( tuple string items 6 of concatenations ", " of substrings separated by " " of it ) ; ( (preceding text of first " " of it | it) of following texts of firsts " && " of it whose(it contains " && ") of it ) ) of "28 0 * * * root test -x /etc/cron.daily/popularity-contest && /etc/cron.daily/popularity-contest --crond "
A: test
A: /etc/cron.daily/popularity-contest
T: 2451
My point isn’t that this is the right solution, or the best solution, but just an example of how it is possible to do some things without regex. It is hard to do this without regex and handle between 1 and an unknown number of commands.