CSV, Regular Expression, and Tuple String Items

Greetings, 2025!

With the march of progress we can now do a slightly better expression using the ‘perl regex’ inspector that doesn’t suffer from some of those edge cases I noted above. Both of these complex cases are handled with the perl-compatible regex.

Given the regex

perl regex "(?:^|,)(?=[^%22]|(%22)?)%22?((?(1)[^%22]*|[^,%22]*))%22?(?=,|$)"

the second matching group, i.e. ‘parenthesized parts 2’, should contain the CSV fields - whether they’re quoted, contain commas, etc.

q: parenthesized parts 2 of matches(perl regex "(?:^|,)(?=[^%22]|(%22)?)%22?((?(1)[^%22]*|[^,%22]*))%22?(?=,|$)") of "123,2.99,AMO024,Title,%22Description, more info%22,,123987564"
A: 123
A: 2.99
A: AMO024
A: Title
A: Description, more info
A:  more info
A: 
A: 123987564
T: 0.317 ms



q: parenthesized parts 2 of matches(perl regex "(?:^|,)(?=[^%22]|(%22)?)%22?((?(1)[^%22]*|[^,%22]*))%22?(?=,|$)") of "1,%223,4,5%22,6"
A: 1
A: 3,4,5
A: 4
A: 5
A: 6
T: 0.162 ms

There’s still another case I haven’t been able to solve though, where a field contains embedded quotes. Seems that it’s always something…
I also found another expression that has different edge cases, but so far it’s still a ‘pick your poison’ issue.

perl regex "(?:,%22|^%22)(%22%22|[\w\W]*?)(?=%22,|%22$)|(?:,(?!%22)|^(?!%22))([^,]*?)(?=$|,)|(\r\n|\n)"

Refs Regex for parsing Microsoft-style CSV data · GitHub and regex - Split a CSV where some entries have double quotes - Stack Overflow

1 Like