Relevance Optimization

ANaik · May 8, 2021, 8:49pm

Use Case:
Need to fetch the “following” keyword of "RESULT: " from a logfile and use this keyword to fetch the lines containing the keyword from “test.txt” file.

I want to optimize the relevance as the evaluation time is too much for a large file and following message is displayed at times “Inspector interrupted”. Please let me know, the possible solutions.

Relevance:
Q: (((lines whose (it contains (following text of firsts "RESULT: " of (lines of file “/test.log”))) of file “test.txt” of it) as trimmed string) as trimmed string) of folders "/tmp"
A: Test1 abc

Below is the content of test.log file:
: Testing
: RESULT: Test1

Content of /tmp/test.txt
Test1 abc
Test2 123
Test3 !@#
Test4 ,./

brolly33 · May 11, 2021, 12:13pm

Classic example of use case for sets and tuples for efficiency.

I want to thank you, for providing all of the needed details, even including a copy of the relevance that was not quite working right. This is a very good way to ask questions here.

In you relevance, for each line in test.txt, you iterate through all lines of test.log
So if you have 100 lines in test.txt and 100 lines in test.log you will create 100x100 objects or 10,000 objects

If you create tuples of sets of lines of each. you end up with 1 x 1 objects. Each object is 100 size, but it’s still just 1 x 1, so effectively 200 objects as opposed to 10,000.

Here is the walkthrough:

I created both files to test with, but I test on Windows, so I changed your folder a little.

q: (file "test.txt" of it, file "test.log" of it) of (folder "/tmp"|parent folder of client)
A: "test.txt" "" "" "" "", "test.log" "" "" "" ""
T: 6.713 ms
I: singular ( file, file )

Pull the sets of lines to avoid cross multiplying the lines

q: (elements of item 0 of it, elements of item 1 of it) of (set of lines of file "test.txt" of it, set of lines of file "test.log" of it) of (folder "/tmp"|parent folder of client)
A: : RESULT: Test1, Test1 abc
A: : RESULT: Test1, Test2 123
A: : RESULT: Test1, Test3 !@#
A: : RESULT: Test1, ( Test4 ,./ )
A: : Testing, Test1 abc
A: : Testing, Test2 123
A: : Testing, Test3 !@#
A: : Testing, ( Test4 ,./ )
T: 3.042 ms
I: plural ( string, string )

now filter the lines to pick only the one(s) you want.

q: (elements of item 0 of it, elements of item 1 of it) whose (item 1 of it contains following text of first "RESULT:" of item 0 of it as trimmed string) of (set of lines of file "test.txt" of it, set of lines of file "test.log" of it) of (folder "/tmp"|parent folder of client)
A: : RESULT: Test1, Test1 abc
T: 1.605 ms
I: plural ( string, string )

And finally, get rid of the Windows folder stuff.

q: (elements of item 0 of it, elements of item 1 of it) whose (item 1 of it contains following text of first "RESULT:" of item 0 of it as trimmed string) of (set of lines of file "test.txt" of it, set of lines of file "test.log" of it) of folder "/tmp"

ANaik · May 19, 2021, 1:43pm

Thanks @brolly33 This worked!!

ANaik · May 26, 2021, 6:11pm

Hi @brolly33,

Thanks a lot for the solution provided!!

The relevance worked fine, until there were multiple entries of Test1*, that is Test1, Test11, Test123 etc. All these entries were picked.

I have modified the relevance slightly, to fetch the exact keyword.

Below is the relevance:

q: (elements of item 0 of it, elements of item 1 of it) whose (item 1 of it contains (following text of first "RESULT:" of item 0 of it as trimmed string) & " ") of (set of lines of file "test.txt" of it, set of lines of file "test.log" of it) of folder "/tmp"

Added " " to differentiate between the multiple entries. In this case, it will pick only “Test1”.
However, the evaluation time has become too large if there are more than 70 entries of “Test1*”.

Evaluation Time : "T: 12516639"

Below is the content of test.log file:

Testing
RESULT: Test1
Data1
Data2

Content of /tmp/test.txt

Test1 abc
Test2 123
Test3 !@#
Test4 ,./
Test12 abcd
Test13 abce
Test16 abcf
Test17 abcg

Can you help me in optimizing this? It would be a great help.
Thanks in advance!!