More efficient way to parse large log file via relevance?

seanscriv · November 8, 2016, 1:38pm

I’m trying to grab the last (most recent) occurrence of a text match in what tends to be a very large log file (>200MB) that gets generated by Google Drive Sync app. I tried using the following which @jgstew very helpfully provided in another post, but the problem is that it’s taking too long to evaluate (giving an error of “Inspector Interrupted”), which I believe is what is causing it to error on some computers with very large log files.

maxima of ((following text of last "-" of it & " " & preceding text of first "-" of following text of first "-" of it as integer as month as three letters & " " & preceding text of first "-" of it) of (preceding text of first " " of it) of lines whose (it contains "Drive batch completed:") of file ("/Users/" & (name of logged on user) & "/Library/Application Support/Google/Drive/user_default/sync_log.log") as date)

This takes around 12 seconds to return an answer with QNA locally on my Mac, but with the error that’s being given on remote machines (“Inspector Interrupted”), I’m going to assume that it’s taking too long to evaluate. Is there a more efficient way of returning the line/date of the last occurrence of “Drive batch completed”?

Thanks!
Sean

jgstew · November 9, 2016, 6:46pm

Yes. In the cases of very large log files, you have to limit the search to the last X lines of the file, then look within those lines for the newest occurrence.

Example:

https://bigfix.me/relevance/details/3002375

( maxima of (it as date) of (following text of last "-" of it & " " & preceding text of first "-" of following text of first "-" of it as integer as month as three letters & " " & preceding text of first "-" of it) of (preceding text of first " " of it) of lines containing "Drive batch completed:" of it) of files "Library/Application Support/Google/Drive/user_default/sync_log.log" of folders of folders "/Users"

This should give the newest sync date for every user on the system. If more than 1 sync_log.log file exists, it will return the newest success date in each. The relevance could be adjusted slightly to give not just the newest sync date per file, but also the user name it belongs to.

This relevance will still have a problem with log files that are too large. It is going to take too long to run.

You need to determine how far back in the log file you want to look for success. Should it be the last 100 lines? The last 1000 lines? How far back is enough so that if no success is found then it clearly indicates a problem?

Also consider a similar relevance property that looks for failure instead of success. How far back should you go to look for the newest failure? Maybe that doesn’t have to reach as far back in the past, but also can be helpful if the relevance looking for success comes up empty, then presumably the one looking for failure will have a hit. In the cases where they both return something, then you can compare the newest failure to the newest success to determine what state the system is in.

jgstew · November 9, 2016, 7:11pm

I may have more efficient code to limit the results to the last x lines, but this is what I have at hand:

(item 1 of /* -> This "it" refers to the last 100 lines of the file -> */ it) whose( /* -> remove empty lines, which is why this relevance can return less than 100 lines per file -> */ it as trimmed string != "") of ( /* -> this is the number of lines of the file from the previous statement -> */ item 1 of it, (lines of /* -> the file object -> */ item 0 of it) ) /* -> This whose statement is responsible for filtering for only the last 100 lines of the file -> */ whose ( (line number of /* -> lines of the file -> */ item 1 of it) > ( /* -> number of lines of the file -> */ item 0 of it - 100 /* <- This is the number of lines to return, which is subtracted from the total # of lines <- */ ) ) of ( /* -> the parent file object itself -> */ it, number of lines of it) of files …

This has some inline comments to help explain what the heck it is doing.

This should be the combined relevance, which is a bit messy.

( maxima of (it as date) of (following text of last "-" of it & " " & preceding text of first "-" of following text of first "-" of it as integer as month as three letters & " " & preceding text of first "-" of it) of (preceding text of first " " of it) of it whose(it contains "Drive batch completed:") of (item 1 of /* -> This "it" refers to the last 100 lines of the file -> */ it) whose( /* -> remove empty lines, which is why this relevance can return less than 100 lines per file -> */ it as trimmed string != "") of ( /* -> this is the number of lines of the file from the previous statement -> */ item 1 of it, (lines of /* -> the file object -> */ item 0 of it) ) /* -> This whose statement is responsible for filtering for only the last 100 lines of the file -> */ whose ( (line number of /* -> lines of the file -> */ item 1 of it) > ( /* -> number of lines of the file -> */ item 0 of it - 100 /* <- This is the number of lines to return, which is subtracted from the total # of lines <- */ ) ) of ( /* -> the parent file object itself -> */ it, number of lines of it) of it) of files "Library/Application Support/Google/Drive/user_default/sync_log.log" of folders of folders "/Users"

Let me know if that works AT ALL, but also if it doesn’t work in places. It may need adjusted for the number of lines of a file it searches, among other things. Also, since I don’t have a real file to work from, I am unable to test this relevance as I go, so I’m definitely doing this a bit blindly.

jgstew · November 9, 2016, 8:01pm

Let me know how this goes, or if there are any issues.

I’d also be curious what the execution time is on any systems you test this on with QnA and how that correlates to the number of lines in the file.

strawgate · November 9, 2016, 8:09pm

Hey,

You can simplify the line relevance using

lines ((integers in (it,it-30)) of (number of lines of it)) of file

Changing it-30 to it-(lines you want to grab)

Bill

jgstew · November 9, 2016, 9:13pm

Thanks @strawgate

I was trying to find that relevance, but could only find my older relevance.

There is a performance issue with 9.5 with that relevance that may also affect my other relevance as well. You’d have to try both on 9.5.x and something earlier, like 9.2 to really tell what the differences are in performance.

seanscriv · November 10, 2016, 11:48am

Thanks. This evaluates very slowly, with each answer (A) taking about 2 seconds. It looks like this is quite inefficient, with the entire file being processed someway for each line.

strawgate · November 10, 2016, 1:20pm

Hi – yes, it’s a known performance defect in 9.5.*

In versions previous to 9.5 I believe it is as performant as the other method,

seanscriv · November 10, 2016, 1:41pm

Thanks @jgstew, this does work. However, I am finding that it is taking 15 seconds to evaluate locally on QNA, versus 7 seconds with the original relevance! That’s on my 58MB log file.

I did also log this as a ticket with IBM (that the relevance is often producing an error but never in local QNA). My main gripe is that this is not taking very long to process via QNA locally, like 7 seconds. But when processing via a property in Bigfix it is taking ten times longer! Their explanation:

The difference of evaluation time between the console property and QnA is normal. The relevance on the property will go through the client, and clients respect the idle/work settings (2% by default) while QnA uses full cpu power.

But that doesn’t really help me

strawgate · November 10, 2016, 1:45pm

Hi,

The real issue is that relevance really isn’t meant for this. Relevance runs constantly in the background and when you give it a large file it will, many times per day, crunch through that file looking for just those last few lines.

In these situations what I do is I write a script that tails the log files and run that script once or twice a day. I save the tailed log to a file like “sync_log_tail.log”. This tailed log may only have 100 lines in it total.

Then my relevance just looks at that tailed log. The tailed log gets updated as often as I have my script running on the system.

jgstew · November 10, 2016, 6:26pm

The relevance I provided adds overhead and will make the relevance eval a bit slower on a small log file, but it should help cap the maximum amount of time it takes on a very large log file.

You should only use relevance like this in an analysis property that reports once every hour, preferably once every 6 hours or less often.

I would recommend trying my relevance on a much larger log file. It is also possible that my relevance won’t work as well on 9.5 due to a performance regression that @strawgate mentioned.

You really need to do the comparison of all 3 options on both 9.2 and 9.5 to really tell what is going on in QnA.

Even though my relevance appears slower in this case, it might be much faster on some systems.

@strawgate is also right that relevance isn’t really meant to parse very large log files like this, but hopefully once the 9.5 performance regression is fixed, it will be more easily done for cases like this.

AlanM · November 12, 2016, 2:57am

Well that quote is mine so you got it from me through support.

The client has to respect that balancing code and is always slower than QnA which doesn’t balance and uses full power. QnA is generally 50 times faster (100% vs 2% by default)

seanscriv · November 14, 2016, 11:45am

Thanks very much for your help with this. I’ve concluded that it’s probably better to go with an action script that writes to a plist file (on Mac) and the registry (on Windows). I get your point that relevance is not designed for reading log files, but it should. Whatever happened to evolving a product with new features and capabilities? I’ve been using Bigfix here and there for 5 years and sadly don’t see much improvements (at least in my line of work). Doing this via action script is far more complicated (thus will lead to more failures/stale data, which won’t be insignificant over 30,000 machines), far less efficient, and requires a lot more time and energy to create. On this last point, the only reason I can take this approach is because I’ve spent all the hours going this route before and have working scripts. Thanks again for your help with the relevance approach!

jgstew · November 15, 2016, 2:09am

Relevance can read log files, and it is something I do very frequently with relevance.

The issue comes when you have a log file that never rotates and grows to large size. Relevance currently doesn’t do a good job with reading a file of very large size.

Though, I do agree, hopefully this will be more easily done in the future.

There are definitely quite a lot of improvements to BigFix over time. The REST API is one of the most significant. Unicode support is up there as well. It definitely could move faster, but it isn’t standing still either.

More efficient way to parse large log file via relevance?

Example:

Related: