Attempting to concatenate all text found between quotes. In the example below there are two sets of double qoutes, however, I’m at a loss as to what needs to be between the double quotes to escape a quote. Any assistance would be greatly appreciated.
if exist file “C:\test.txt” then (concatenation “;” of (following texts of firsts “” of preceding texts of lasts ""of lines whose (it as lowercase contains “test”) of file “C:\test.txt”) as string) else “No Data”
'substrings separated by ’ will simply break up the text wherever it finds , removing the delimiter. Your query is actually hard to generalize in relevance. Can you guarantee that the string ‘test’ will always appear in between the quotes and not in the rest of the text? If so, this will work:
concatenation “;” of substrings separated by “%22” whose (it != “” AND it contains “test”) of concatenation of lines whose (it contains “test”) of file “c:\test.txt”
First and foremost the assistance with this is very much appreciated.
There are two scenarios.
The string ‘test’ will never reside outside of the quotes.
The string ‘test’ might not reside within the quotes, however, I’m forseeing a need to concatenate all text between the quotes in a property / analysis.
The only way I know of to do it in relevance is extremely slow, so I really really really don’t recommend using it. It has running time proportional to (size of file * number of quotes). On a 5k text file, it takes 1.5 seconds. On a 200k text file, it didn’t return after a couple of minutes. It works on the idea that something is in quotes if it is preceded by an odd number of quotation marks.
concatenation “;” of substrings separated by “%22” whose ( number of ( characters ( positions of it ) whose ( it = “%22” ) of preceding text of it ) mod 2 = 1 ) of concatenation of lines of file “c:\test.txt”
Jesse’s clause will not have a slowdown like this though, so if you can guarantee that “test” will always appear in the string, that’d be ideal. If you can’t guarantee that, then relevance is probably not the best way to do this.
Brian’s example inspired me to try a slight modification to the relevance he suggested.
Rather than concatenating the entire file at the beginning and parsing that huge concatenation, evaluate the lines individually then concatenate the results. Also filter for lines that have a %22 in them to avoid processing lines that will not give a result anyway. I think you will see impressive gains in speed doing it this way.
q: concatenation “;” of substrings separated by “%22” whose ( number of ( characters ( positions of it ) whose ( it = “%22” ) of preceding text of it ) mod 2 = 1 ) of
concatenation of lines of file “c:\test.txt”
A: 3test;6test;1.test;test56;1.4test
T: 8.300 ms
I: singular string
q: concatenation “;” of substrings separated by “%22” whose ( number of ( characters ( positions of it ) whose ( it = “%22” ) of preceding text of it ) mod 2 = 1 ) of
lines whose (it contains “%22”) of file “c:\test.txt”
A: 3test;6test;1.test;test56;1.4test
T: 1.815 ms
I: singular string
By the way Brian, that is some pretty slick string processing in there with the mod 2. Do you mind running the new relevance against your 200k file to check for speed? I tested it against a 5000k file with only a very few lines of quotes in it and got these results:
q: concatenation “;” of substrings separated by “%22” whose ( number of ( characters ( positions of it ) whose ( it = “%22” ) of preceding text of it ) mod 2 = 1 ) of concatenation of lines of file “c:\searchresults.txt”
A: test2;test 3;test1
T: 11997.690 ms
I: singular string
q: concatenation “;” of substrings separated by “%22” whose ( number of ( characters ( positions of it ) whose ( it = “%22” ) of preceding text of it ) mod 2 = 1 ) OF lines whose (it contains “%22”) of file “c:\searchresults.txt”
I tested on a 300k file that consisted of about 20,000 lines of
"foo"bar"foo"bar
and it took 9 seconds to run, which is a huge improvement over what I had. On a 1MB file with around 1,000 quoted strings, it took only 158ms. It doesn’t sound like there’s going to be tens of thousands of strings in the files though, so Brolly’s relevance will probably do the trick. The downside of it is that it doesn’t allow quotes to span multiple lines, or if the file doesn’t have line breaks it has the same slow performance as before. (but it doesn’t sound like either of those should be a problem?)