Because this is a search across multiple files, and I suppose those files can be quite large, I’d start by pointing out some optimizations (usually I save optimizing for the end, but I already expect this to be a slow search).
Since we don’t need to retrieve the line containing the string but just verify that one exists, is that we can scan content of file contains <string>
rather than exists lines whose (it contains <string>) of file
. This avoids the overhead of splitting a file into its lines and then scanning each line… my test files only have 1 line each, but this improves more as the number of lines in the files increase
q: (name of it, exists lines containing "Dynatrace OneAgent failed to connect to Dynatrace Cluster Node" of it) of files of folders "c:\temp\test"
A: 1.txt, False
A: 1_installation_.txt, False
A: 2_installation_.txt, True
A: 3_installation_.txt, False
A: 4_installation_.txt, True
A: 5_installation_.txt, False
T: 16.310 ms
I: plural ( string, boolean )
q: (name of it, content of it contains "Dynatrace OneAgent failed to connect to Dynatrace Cluster Node") of files of folders "c:\temp\test"
A: 1.txt, False
A: 1_installation_.txt, False
A: 2_installation_.txt, True
A: 3_installation_.txt, False
A: 4_installation_.txt, True
A: 5_installation_.txt, False
T: 11.425 ms
I: plural ( string, boolean )
Given these, we can find the maximum of the modification times of the files that match our criteria
q: maximum of modification times of find files "*installation_*" whose (content of it contains "Dynatrace OneAgent failed to connect to Dynatrace Cluster Node") of folders "c:\temp\test"
A: Fri, 15 Jul 2022 11:30:16 -0500
Now we need to find which of the files matches that latest modification time, and it’s important to not repeat the file content search every time. If we used a structure like
q: names of items 0 of (find files "*installation_*" of it, maximum of modification times of find files "*installation_*" whose (content of it contains "Dynatrace OneAgent failed to connect to Dynatrace Cluster Node") of it, it) whose (modification time of item 0 of it = item 1 of it) of folders "c:\temp\test"
A: 4_installation_.txt
T: 23.593 ms
I: plural string
We do get a valid answer, and it’s correct - but the problem is the evaluation time. As each file is opened and scanned in item 0, the entire directory is scanned (again) to find the latest file in item 1. So for my five matching files of item 0, the file scan is repeated five times - each file is opened and read 25 times.
A better structure is to calculate that latest modification time (once) and then pass it up into another “loop” in this way -
q: names of items 1 of (item 0 of it, find files "*installation_*" of item 1 of it ) whose (item 0 of it = modification time of item 1 of it) of (maximum of modification times of find files "*installation_*" whose (content of it contains "Dynatrace OneAgent failed to connect to Dynatrace Cluster Node") of it, it) of folders "c:\temp\test"
A: 4_installation_.txt
T: 5.929 ms
I: plural string
This time, all five files are scanned one time to find the latest modification time, and then all five files are checked again to see which of the five matches the modification time. We can see the difference in that the first option has an evaluation time of 23 ms, while the second is down to 6 ms.