Looking for some suggestions on what I can look at.
I have a policy action that is set to run every 15 minutes. Now I do not expect it to fire every 15 minutes, but I would hope for every 30 to 60 minutes. Basically all this policy does is check the value of a registry key and depending on the value of the key, another key is set. When the action runs, it runs very quick, so not a problem there.
The biggest problem is that this only happens on a few systems. Most of my environment runs as I would expect. I thought that this might be an issue with load on the server, but the CPU is only at about 5% with a peak every few hours of maybe 40%. I then thought it might be a hypervisor issue since the system I was checking is a VM. I then checked other VMs on the same hypervisor and they did not have this issue. On top of this, I found a physical server with 12 processors and 16GB RAM that has the issue. That server is actually a DR server, so it had no load at all on it.
I have enabled the Usage Profiler on it and I can see that on the slower system it is for sure…slow When I compare the usage profiler logs between a normal system and the slow system, it identifies many of the same top 10, but the evaluation times are just way longer. Also I see that the samples on the slow system is about 3000-5000 and on a normal system it is about 50,000 to 60,000.
Also to note, both the normal and slow systems evaluate all the same content including custom baselines, analysis and fixlets/tasks.
Here is an example of time differences between two systems
Mon, 26 Nov 2018 08:59:33 -0700 Complete file Enterprise Security/2014 Security Bulletins (Apps).fxf: 121381 microseconds
Tue, 20 Nov 2018 04:52:48 -0700 Complete file Enterprise Security/2014 Security Bulletins (Apps).fxf: 4244516 microseconds
I know that we have 2 baselines with about 125 components in them and I am going to work with my server team to clean those up, but these baselines are processed by normal and slow systems.
On one of the slow systems, I did increase the CPU to 10% and it made a bit of a difference, but it is still not near a normal system.
I do have a PMR open already, but we have been on it for a couple weeks and I need to get some other ideas to test.
I can share the policy action if needed, but not sure if it will point to anything as it is not an issue on about 95% of our systems.