Relevance: Speeding Up Event Log Queries

Hello!

I wanted to take you on a journey of how slow event log queries can be and some tricks you can use to speed it up.

Relevance and WMI

On my system at home the security event log is 20mb but at work it’s 80mb and querying that with BigFix is slow…

Q: number of records of security event log
A: 121063
T: 7885.829 ms

Doing the same all-encompassing call with WMI ends after a couple of minutes in a timeout…

Q: number of select objects "* from Win32_NTLogEvent where Logfile = 'Security'" of WMI
E: The expression could not be evaluated: Windows Error 0x80041032: Call cancelled

With Relevance you can only filter this after getting them so this is the best case for this query, anything we add onto it or change about it will only make the evaluation time worse.

Q: number of records whose (time generated of it > now) of security event log
A: 0
T: 8024.829 ms

Using WMI to capture event logs is also slow but the more we filter it the faster that it gets instead of slower:

Events from Today and Yesterday

Q: number of SELECT objects ("* FROM Win32_NTLogEvent Where Logfile = 'Security' and TimeGenerated > '" & (year of it as string & ((month of it) as two digits) & (day_of_month of it as two digits)) of (current date - 1*day) & "000000.000000+120' and EventCode= 4799") of wmi 
A: 15253
T: 12613.578 ms

Events from Today:

Q: number of SELECT objects ("* FROM Win32_NTLogEvent Where Logfile = 'Security' and TimeGenerated > '" & (year of it as string & ((month of it) as two digits) & (day_of_month of it as two digits)) of (current date - 0*day) & "000000.000000+120' and EventCode= 4799") of wmi
A: 3213
T: 2215.715 ms

So with Relevance we start slow and get slower as we add filtering criteria but with WMI we start slower and get faster when we add filtering criteria.

The big advantage from this comes not with fast computers but with slow computers. Often times we see event log queries on slow computers take minutes where the same query on a newer machine takes seconds. If we can use WMI to limit the scope of the query we can make it take seconds on slow machines and seconds on fast machines!

Another way!

So I don’t like using WMI and I wanted to find a way to really limit the return from the query. Limiting it based on time can still result in a huge number of events being returned if something is being verbose into the event logs.

I thought – hey! I can just return individual event records like this:

records (1;2;3;4;5;6) of application event log

And then I ran a test to see how fast it was:

Q: number of records (1;2;3;4;5;6) of application event log
A: 6
T: 3.410 ms

Q: number of records of application event log
A: 4780
T: 434.685 ms

Hey! That’s pretty fast!

The only issue is that event log 1;2;3;4;5;6 might not always exist. If a log has rolled over like the security log frequently does then the first log in the event log might be record number 123123 and not 1. So we have to use:

oldest record number of <event log>
record count of <event log>	

When combined, (oldest record number of it + record count of it) of application event log we start going places!

We can now do (with the super slow security event log):

Q: number of records ((it-1;it-2;it-3;it-4;it-5;it-6) of (oldest record number of it + record count of it)) of security event log
A: 6
T: 2.923 ms

Notice how my fastest Security Event Log query was 8 seconds before and now we are at 3 milliseconds!

My Solution

The final piece of relevance can stand in place of anywhere you use records and the item 1 of it - 500 controls how many records get returned (500):

records ((integers in(item 0 of it + item 1 of it - 1,maximum of (item 0 of it + item 1 of it - 500;item 1 of it))) of (record count of it, oldest record number of it)) of event log <event log>

We can now return 500+ event logs without grinding the system to a halt!

Q: number of records ((integers in(item 0 of it + item 1 of it - 1,maximum of (item 0 of it + item 1 of it - 500;item 1 of it))) of (record count of it, oldest record number of it)) of security event log
A: 500
T: 270.890 ms

Note: maximum of (…;item 1 of it) is an optimization that stops us from requesting event records that don’t exist in the case that there are fewer than 500 events in the log.

On my systems this stops being faster at around the 5000 event mark so if you’re pulling more than 5000 events you should consider testing just grabbing the whole log.

Happy Eventing!
Bill

10 Likes

I like it that integers in will count up or down without needing to specify:

Q: integers in(0,3)
A: 0
A: 1
A: 2
A: 3
T: 0.032 ms
I: plural integer

Q: integers in(3,1)
A: 3
A: 2
A: 1
T: 0.043 ms
I: plural integer
3 Likes