Digit \d matching in regex

Can someone please explain why the \d character class is not matching for me? An example from the relevance language reference has:

Q: parenthesized part 1 of ( matches (regex “(\d\d\d\d)(\d\d)(\d\d)” ) of “20051201”)
A: 2005

But when I run qna on redhat 5, client 9.2.0.363 or 9.0.787.0, the \d digit character class does not seem to match. I get this:

Q: parenthesized part 1 of ( matches (regex “(\d\d\d\d)(\d\d)(\d\d)” ) of “20051201”)
E: Singular expression refers to nonexistent object.
T: 329
I: substring

It is obvious that the digit \d character class regex is not recognized when you do this:

Q: parenthesized part 1 of ( matches (regex “(\d\d\d\d)(\d\d)(\d\d)” ) of “dddddddd”)
A: dddd
T: 330
I: substring

On the other hand the whitespace \s class does seem to work:

Q: parenthesized part 1 of ( matches (regex “(\d\s\s\d)(\d\d)(\d\d)” ) of “d ddddd”)
A: d d
T: 214
I: substring

Notice that a range character class works:

Q: parenthesized part 1 of ( matches (regex “([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])” ) of “20051201”)
A: 2005
T: 359
I: substring

Also POSIX digit character class works too:

Q: parenthesized part 1 of ( matches (regex “([[:digit:]][[:digit:]][[:digit:]][[:digit:]])([[:digit:]][[:digit:]])([[:digit:]][[:digit:]])” ) of “20051201”)
A: 2005
T: 280
I: substring

Here is an example from usgcb oval that made me notice this issue:

<ind-def:pattern operation="pattern match">^[\s]*PASS_MIN_LEN[\s]*([\d]+)</ind-def:pattern>

(It is interesting to note that whoever wrote that put those character classes into brackets which is unnecessary.)

Have you run this through the client or just QNA? It is entirely possible you encountered a glitch in QNA vs a glitch in the relevance engine. Also, try using \d instead and see if that works. Either way, this sounds like something you should open a PMR on so a bug report can be logged.

There are so many ways to parse a date. Not exactly why “\d” wouldn’t work in qna, but do these work? Does the regex work through the client/console on the same systems in which it doesn’t work through qna? I have had many cases where something fails through the Fixlet Debugger / QnA, but works just fine through the client itself. This is why I often test relevance using Analysis properties.

Try:

("YYYY:" & parenthesized part 1 of it & " MM:" & parenthesized part 2 of it & " DD:" & parenthesized part 3 of it) of ( matches (regex "([\d]{4})([\d]{2})([\d]{2})" ) of "20051201")

YYYY

first 4 of "20051201"

MM

substring (4,2) of "20051201"

DD

last 2 of "20051201"

Also, see these examples:

http://bigfix.me/relevance/details/2999373
http://bigfix.me/relevance/details/2999691
http://bigfix.me/relevance/details/2999688

http://bigfix.me/relevance/details/2996999
http://bigfix.me/relevance/details/2999372
http://bigfix.me/relevance/details/2997194
http://bigfix.me/relevance/details/2999371

The difference is most likely the source of regex in each platform.

The sample does work on Windows:

q: parenthesized part 1 of ( matches (regex "(\d\d\d\d)(\d\d)(\d\d)" ) of "20051201")
A: 2005
I: singular substring

Doesn’t on the Mac however:

Q: parenthesized part 1 of ( matches (regex "(\d\d\d\d)(\d\d)(\d\d)" ) of "20051201")
E: Singular expression refers to nonexistent object.
T: 5042

So most likely it is our regex engine which does differ on the platforms. If memory serves correctly regex on Windows is provided by the boost libraries and it is not on other platforms due to compilation issues with that library.

2 Likes

Is there any documentation as to which regex strings DO match on Linux? Or at least which libraries BigFix is using on Linux platforms? I’m having trouble with square brackets, which in turn is preventing me from negating character classes.

1 Like

Do you have an example of what bracket issue you are having? [abc] works as does [[:digit:]] or [[:space:]] or even negating like [^abc] or [^[:space:]]. Most posix regex seems to work. \s and \S work but putting them in a class does not, as in [\s]

THANK YOU @iheartrelevance!
Pointing me to the POSIX implementation and [[:space:]] got me exactly what I needed. For reference, what I’m doing is parsing the command line used in Quest Authentication Services to join a Linux/UNIX client to an Active Directory domain. I’m interested in three different types of information - “operands” like the domain name, domain controller names (if present), etc. which have no “-” operator; “parameters” which are of the form “-x parametervalue”; and “flags” which are in the form “–flagname”. Here’s what I ended up with; it does have a limitation that if a quoted field exists and has multiple words, I match both the entire quoted field, and the words within the quoted field that also match; but it’s good enough now for what I need:

Q: exists file "/etc/opt/quest/vas/lastjoin"
A: True
T: 69

Q: line 1 of file "/etc/opt/quest/vas/lastjoin"
A: /opt/quest/bin/vastool -u ndjsmsp-joiner$ -k /etc/vas/keytab join -n ndjsmspxadm01 -f --site-only-servers --no-timesync -u “OU=MyOU,OU=Accounts,dc=mydomain,dc=mycompany,dc=gov;OU=Administrators,dc=mydomain,dc=mycompany,dc=gov” -g “OU=JSMSGroups,OU=JSMS,OU=JS,dc=mydomain,dc=mycompany,dc=gov” --preload-nested-memberships mydomain.mycompany.gov
T: 124

// Parameters, excluding parameter "-n " (local computername override if present)

Q: (if not exists file (it) then “<no /etc/opt/quest/vas/lastjoin file>” else (matches (regex “[[:blank:]]±\w[[:blank:]]((^-+)|(%22[^%22]*%22))”) of it) whose (it does not start with " -n ") of (line 1 of file (it))) of "/etc/opt/quest/vas/lastjoin"
A: -u ndjsmsp-joiner$
A: -k /etc/vas/keytab
A: -u "OU=MyOU,OU=Accounts,dc=mydomain,dc=mycompany,dc=gov;OU=Administrators,dc=mydomain,dc=mycompany,dc=gov"
A: -g "OU=JSMSGroups,OU=JSMS,OU=JS,dc=mydomain,dc=mycompany,dc=gov"
T: 593

// Flags; a word that begins with “–”, OR the special case “-f” flag
Q: (if not exists file (it) then “<no /etc/opt/quest/vas/lastjoin file>” else (matches (regex “[:blank:]”) of it) of (line 1 of file (it))) of "/etc/opt/quest/vas/lastjoin"
A: -f
A: --site-only-servers
A: --no-timesync
A: --preload-nested-memberships
T: 415

// Operands
Q: (if not exists file (it) then “<no /etc/opt/quest/vas/lastjoin file>” else (parenthesized parts 10 of matches(regex("((^|([[:blank:]])(((–)|[^-])((\w|\d|[-;,=+/.]|%22[^%22]%22)+))([[:blank:]]|$)))((([^-]\w(\w|\d|[-;,=+/.])))|%22[^%22]*%22)([[:blank:]]|$)")) of it) whose (it != " " & computer name as lowercase & " " AND it !="%09") of (line 1 of file (it)) ) of"/etc/opt/quest/vas/lastjoin"
A: /opt/quest/bin/vastool
A: join
A: mydomain.mycompany.gov
T: 892

// Is the computername overridden (boolean)?

Q: (if not exists file (it) then false else exists (matches (regex “[[:blank:]]±n[[:blank:]]((^-+)|(%22[^%22]*%22))”) of it) whose (following text of last " " of (it as trimmed string as lowercase) != computer name as lowercase) of (line 1 of file (it))) of "/etc/opt/quest/vas/lastjoin"
A: False
T: 182

Well, I thought I was done. Too bad the regex inspector doesn’t appear to work in AIX.

Q: operating system
A: Linux Red Hat Enterprise Server 5.11 (2.6.18-400.el5)
T: 157

Q: matches (regex ("([[:blank:]]|^)\w*([[:blank:]]|$)")) of "one two three"
A: one
A: two
A: three
T: 271

Q: operating system
A: AIX 7.1
T: 15103

Q: matches (regex ("([[:blank:]]|^)\w*([[:blank:]]|$)")) of "one two three"
T: 98


I’ve just opened a PMR on it, but has anyone else seen this before in AIX and are there any workarounds? According to the Inspector List these should be available on AIX.

Obvious workaround:
matches (regex ("([[:blank:]]|^)[[:alnum:]_]*([[:blank:]]|$)")) of “one two three”

\w is just [A-Za-z0-9_]

Even though qna doesn’t support pcre, man perlrecharclass (and man perlre) for some of the best doc on the subject including posix regex.

As far as I can tell, none of the regex inspectors work on AIX.

Q: matches (regex ("([[:blank:]]|^)[[:alnum:]_]*([[:blank:]]|$)")) of "one two three"
A: one
A: two
A: three
T: 458903

Q: operating system
A: AIX 7.1
T: 156964

Q: version of client
A: 9.2.0.360
T: 2123903

Well, pretty conclusive that something works. Any idea whether this depends on an external library that might not be present on my system, or if there’s a guide to which kind of regex terms are/are not supported? I’m pretty disturbed that a working regex on Linux does not work on AIX.

I don’t have access to my AIX systems from home but I’ll try this out tomorrow and let everyone know whether the regex you supply works on my platform. Thanks for giving me something to investigate!

Your regex does indeed seem to work on AIX. But I’m definitely getting different results with other regex’s (regexi? regii?) between Linux and AIX, so I’m still trying to determine what’s supported.

It’s regexen. (An apostrophe is never used to make a plural.)

Like I said, use posix character classes and avoid pcre. Did you have some other example of regex that doesn’t work other than some of the backslash sequence (\w, \d, etc.) character classes? Since every one of them has a workaround by using ranges [0-9] or posix [[:digit:]] for \d for example, what exactly are you asking?

Oh and non-capturing parentheses (?:foo|bar) don’t work, along with probably all of the similar pcre extended patterns.

Capturing parens and alternation work:

Q: parenthesized part 2 of (matches(regex “(foo)(bar)”) of “foobar”)
A: bar

Q: matches(regex “(foo|bar)”) of "foo"
A: foo

I think it is probably safe to say you can rely on any regex found in the ed and regex manpages on aix.