Parsing log file returns 14EndOfFileError(s) and Inspector interrupted(s)

eg2428 · April 21, 2024, 3:23am

We have some scanning software that occasionally writes an “Agent ID” value to a log file. This Agent ID is unique to each device, and helpful to correlate data between our platforms, so we’d like to retrieve it in BigFix.

An example line from the log file is below:
2024-04-20 21:04:31,974 [INFO] [agent.agent_beacon]: Agent Info -- ID: 0bd66ff79c27******************** Version: 4.0.7.26 (1712348425)

This line appears somewhat frequently in the log file, and I’d like to get the data between ID: and Version:. The log file is frequently written to, and across our deployment, ranges from ~1MB to 10MB.

I’ve been using the follow relevance to grab the ID:

Q: unique values of (preceding texts of firsts "Version: " of (following texts of firsts "ID: " of (locked lines containing "Agent Info" of files ("C:\Program Files\Rapid7\Insight Agent\components\insight_agent\common\agent.log" ; "/opt/rapid7/ir_agent/components/insight_agent/common/agent.log"))))
A: 0bd66ff79c27******************** 
T: 199.383 ms
I: plural string with multiplicity

The problem I’m having is that a huge number of endpoints return <error>, with the reason being The expression could not be evaluated: File error "14EndOfFileError" on Macs, and Inspector interrupted. on Windows

I have not found any correlation between the size of the log file and the error occurring.

Stripping the relevance down to just the locked lines gives me very similar evaluation times:

Q: (locked lines containing "Agent Info" of files ("C:\Program Files\Rapid7\Insight Agent\components\insight_agent\common\agent.log" ; "/opt/rapid7/ir_agent/components/insight_agent/common/agent.log"))
A: 2024-04-20 21:04:31,974 [INFO] [agent.agent_beacon]: Agent Info -- ID: 0bd66ff79c27******************* Version: 4.0.7.26 (1712348425)
A: 2024-04-20 21:05:04,036 [INFO] [agent.agent_beacon]: Agent Info -- ID: 0bd66ff79c27******************** Version: 4.0.7.26 (1712348425)
[...]
A: 2024-04-21 03:16:08,211 [INFO] [agent.agent_beacon]: Agent Info -- ID: 0bd66ff79c27******************** Version: 4.0.7.26 (1712348425)
T: 199.469 ms
I: plural file line

so I don’t think there’s any way to improve on the runtime.

Is it perhaps just doomed without deploying a separate fixlet to grab and store the data?

ageorgiev · April 22, 2024, 6:34am

Don’t have any Macs in my environment, so can’t check but chances are something is not right with relevance (using inspector that doesn’t work on Mac or paths are wrong; etc.Just from purely looking at your code, I would add if-statement for each of the paths and see if that makes a difference but that’s just me:

unique values of (preceding texts of firsts "Version: " of (following texts of firsts "ID: " of (locked lines containing "Agent Info" of files ((if (exist file of it) then (it) else (nothing)) of "C:\Program Files\Rapid7\Insight Agent\components\insight_agent\common\agent.log" ; (if (exist file of it) then (it) else (nothing)) of "/opt/rapid7/ir_agent/components/insight_agent/common/agent.log"))))

As for the Windows-error, I can tell you that it is linked to the length of evaluation (so yes, the size of the file will be directly responsible as the bigger the file, the longer the evaluation time will be) and the fact the agent is sent interruption signal (notification for new action; manual “send refresh”, etc) and when that happens it drops the current evaluation and proceeds with whatever higher-priority action it is. If you wait long enough you may get machine with interruption error to actually report a value without doing anything (just actually waiting for one of next re-evaluations to happen at a time when there are no interruptions) but that will depend on how busy your environment/clients are…

Generally speaking, BigFix is not really designed to interrogate big files and work with big data sets and if I am putting something similar to what you are describing would consider an “audit action” instead - a quick script (powershell for example), that will run at desired frequency/times and just do the same: Get-Content “…” | Select-String …
and get the script to simply write to a file/reg key the actual Agent ID you are after, and then you can have a property just reads that. This will remove the hogging of the agents extensively to read entire log for a single string; will remove any CPU implications from it (and yes, heavy properties like this can cause CPU spikes on machines); it will give you control at what times of day the heavy-lifting occurs; etc… things that with property/relevance you will not have.

eg2428 · April 22, 2024, 3:42pm

Thank you for taking a look.

I don’t believe there’s any syntactical issues with the Mac relevance, as it does run correctly for the most part. And I would hesitate to add if-statements that add even more to the runtime.

I completely understand that all signs point to this being a length-of-evaluation issue, but I’m struggling to figure out why.

We’re seeing this on brand new computers (e.g. M3 Macs) with small (<5MB) log files. It feels like something BigFix should be able to handle without a sweat. We’re also seeing it run fine on much older computers with much larger (~25MB) log files, which I’m surprised/impressed by.

If it was just a mix of older endpoints and huge log files, I could concede and say it’s just taking a while to evaluate, but the lack of consistency in when it fails is confusing.

It looks like there is a trend towards it failing more often on large files, but doesn’t it seem too high a failure rate for small log files as well? In terms of reading files with relevance, what constitutes a “large”/“small” file for BigFix? Perhaps I just need to readjust my expectations.

JasonWalker · April 22, 2024, 4:42pm

You should probably open a support ticket on it.

That said, this looks like it should be a pretty efficient query, I don’t see it creating any cross-product tuples which is where performance usually becomes a problem.

I wonder whether it’s something specific to how Rapid7 is handling the log file - i.e. whether it’s doing something to overwrite portions of the file, or truncate the file, while we’re trying to read it?

I also have to think there’s a better way to get the agent ID rather than reading it from the log; but that doesn’t really help with the general case of reading a log file.

eg2428 · April 22, 2024, 5:31pm

Thank you, I’ve opened a ticket.

I don’t think Rapid7 is removing/overwriting anything from the log (couldn’t see anything like that after monitoring it for a bit), but it does write to it very frequently. The only operation I’m aware of is that once the log file reaches 25MB, it rotates to a new log and puts the old one in a .zip file. I’m not sure if locked lines offers any protection against file rotation, but I think the errors are showing up much more frequently than logs reaching 25MB and being caught mid-rotation.

I too would have thought there was a better way to get the agent ID, but I’ve yet to find it. I’ll ask around on that.

JasonWalker · April 22, 2024, 5:43pm

Hm, that log file rotation sounds like it behaves similarly to our own BES Client Log, which is the use-case I frequently have for ‘locked lines of file’ and the inspector works pretty reliably there. I was hoping it was something very specific like a rolling log file of the most recent 20 MB of events, with older ones constantly overwritten (more like a Windows Event Log)