Non-Windows (UNIX, Linux) Performance Tuning Questions

(imported topic written by boerio)

I’m interested in talking with people who have large scale UNIX/Linux installations and are tuning their BESClient to take as little system resources as possible. I’m interested in getting the CPU utilization down to <= 0.5% on average, and still have reasonable usability.

To date, we’ve played mostly with _BESClient_Resource_WorkNormal and _BESClient_Resource_SleepNormal settings, and this seems to do a nice job of knocking down CPU spikes during the gather process. But what I don’t know is if we’re doing any more harm than good.

Also, I think changing _BESClient_Comm_CommandPollIntervalSeconds reduce the number of times the gather process runs, right? Since that seems to be a moderate consumption of resources, changing this number to be relatively high would help (the downside being we would lose flexibility of getting new fixlets pushed into production).

(imported comment written by boerio)

Similarly, I’d like to know if anyone has done any disk I/O characterization to see how this comes into play. Of course, with custom fixlets, mileage will vary.

(imported comment written by BenKus)

Hi Boerio,

Changing the _BESClient_Resource_WorkIdle and _BESClient_Resource_SleepIdle will change how much time the agent does “background” evaluation (which the vast majority of the time). The default is work for 10ms and sleep for 480ms. To make the agent use less CPU, change the 10ms to something lower, like 5 or lower (there is a Task to do this on the BES Support site called “BES Client Setting: CPU Usage”). Changing this to a lower value causes the agent to be slower to notice new Fixlets are relevant or notice that policy actions need to be run.

The _BESClient_Resource_WorkNormal and _BESClient_Resource_WorkNormal will change how the agent asked when you send it a new action or new Fixlet. The default is work for 20ms and sleep for 1ms until the new information has been evaluated (which is usually very fast) and then go back to “idle” mode. Changing this to a lower value (or increasing the SleepNormal) causes the agent to respond slower to new actions/Fixlets and refreshes.

These settings also affect disk IO usage. The agent’s concept of “working” includes both CPU and disk activity and if you limit the worknormal/workidle, you will slow down the disk usage already.

Note with all this that the agents are already tuned to very low values and the vast majority of agents (millions and millions across the world) work with these values with no impact on the computers. Are you seeing something different?

The _BESClient_Comm_CommandPollIntervalSeconds probably won’t help you limit gathering behavior. Usually this is used to increase gather times because the default is once every 24 hours.

Ben

1 Like

(imported comment written by SystemAdmin)

Also, in some situations it might make sense to put the client into a sleep mode where it will really do nothing.

http://support.bigfix.com/cgi-bin/kbdirect.pl?id=247

Note that there are very different behavior characteristics if you use this setting, the client isn’t ‘going slower’, it is really doing absolutely nothing when it is asleep and won’t respond to actions, properties, ect at all until it wakes up. Sometimes you might want the agent to only work for a few hours a day though and this sleep mode can provide that behavior and keep it quiet outside of the work window.

(imported comment written by boerio)

Tyler,

If I put the client to sleep using the quiet settings, I should see absolutely no CPU utilization, correct? It should be a sleeping process, only checking ~1x/minute to see if it’s time to really wake up.

Am I correct?

Thanks.

(imported comment written by SystemAdmin)

Yes, that’s the expected behavior. The client is in a small loop checking to see if its time to wake up, otherwise sleep and try again later. So, you do see some cpu but an extremely small amount, something like a few cycles every minute.

(imported comment written by boerio)

Thanks for the information. This looks like a very viable setup for our infrastructure.

Is there anything in the BigFix capability that says “only cycle ones through checks in a 24 hour period” rather than “you’re awake for this period of time, repeatedly check everything you can in the time you have”?

(imported comment written by SystemAdmin)

Not really. We try to simplify how the client works by saying it goes in cycles because it is easy to understand and the behavior is approximately like that but its more complex on the lowest levels. So, the client doesn’t have a start/end point to track evaluation cycles… Also, you have to consider that if a patch is applied or something is changed by the client it might need to do several cycles to get the final state of the computer.