Improving relevance efficiency

JasonWalker · March 13, 2024, 6:57pm

Part of the issue is this structure. Think of each whose() block as a nested loop…here, for every line of the file lines whose(), you are re-calculating the number of lines of the file. A file with 50 lines will calculate the number of lines fifty times… here I’ll retrieve only the ‘number of lines’ to count results for each query, to make the time differences more clear.

q: number of lines of file "c:\temp\shortfile.txt"
A: 50
T: 0.469 ms

// this will repeat the 'number of lines' calculation for each line in the file...
q: number of lines whose (line number of it > number of lines of file "c:\temp\shortfile.txt" - 2) of file "c:\temp\shortfile.txt"
A: 2
T: 14.037 ms

(I tried that same with a fifty thousand line file, but it hasn’t retrieved the answer in over ten minutes now)

A better structure may be the one posted by @atlauren at Last N lines of a file containing a string - #12 by atlauren


q: number of lines (integers in ((number of lines of it - 1),(number of lines of it))) of file "c:\temp\shortfile.txt"
A: 2
T: 0.861 ms

This only has to retrieve the number of lines in the file twice, and then retrieve just those lines by their line numbers explicitly.

There may even be a slightly faster version that only has to retrieve the number of lines once…

q: number of lines (integers in ((it - 1 ,it ) of number of lines of it)) of file "c:\temp\shortfile.txt"
A: 2
T: 0.671 ms

I think if you apply this to your query above you should get some much faster results.

This technique uses the ‘integers in’ inspector to give an explicit list of numbers, like

Q: integers in (1, 3)
A: 1
A: 2
A: 3

combining that with the lines (<line number>) of <file> inspector to retrieve those specific line numbers…