Bigfix client responsiveness

dgendera · May 26, 2020, 5:59am

This might be a long post but just looking for some advice and/or best practices. I’ve also opened case with HCL Support on this.
We have developed our own Software Portal where users can go and install/uninstall software. The reason for this is that the client requests consistency for end user, and this Software Portal is in use with our current managed solution which is being replaced by BigFix 9.5.x. later this year (October 2020)
The software that’s being deployed is commercially available and will be installed silently on the users workstation (ex. Google Chrome, SAP, Adobe Reader, Office 2016…) For this to work from the client side we’re using concept of FlagFile and if that’s present on the workstation software will be installed/uninstalled. This flagfile is created by our Software Portal. This is where we face the issue from our testing we have seen installation starting within minute but also after 20 minutes and that’s not good end user experience, so looking for ways how to improve this overall experience and try to make it consistent. When checking the logs I’m looking at timings when the action starts, I know that if you install SAP which comes in at 1GB it will take time to download the binaries depending on bandwidth etc. While I don’t like to compare apples and pears with our current management solution the user experience when requesting software installation is instantaneously
and that’s what we’re trying to achieve also with BigFix.

What have we tried already

Increase the CPU Cycles of BigFix client when our Software Portal is started, problem here the client is currently configured for default CPU Usage (<1-2%) and it does not pickup the changes fast enough so still user experience is not great. This detection is done based on flagFile, and then we change following 2 settings “_BESClient_Resource_SleepIdle and _BESClient_Resource_WorkIdle”. This would be our preferred solution as this will only increase CPU Cycles when user is requesting software to be installed, once software is installed and portal is closed the CPU will be reset back to default settings.
For each application we have created 1 action (Policy open ended) which has relevance that will be true if flagfile is present.
I’ve tried already the following:
Set CPU Cycle using “BES Client Settings: CPU Usage” and we have tried high, very high and do see improvement but on the other hand it’s also impacting our workstation performance especially for our basic models we can see CPU constantly between 5-15%
Also tried using registry value vs. flag file but did not make difference from responsiveness, which is the same.
Below table with timing when using high/very high
Production Environment
<< Change CPU Settings to “High” WorkIdle=50 | SleepIdle=450 >>
Repair 7-Zip 17:27 Repair Started: 17:29
Remove 7-Zip 17:33 Remove Started: 17:36
Install 7-Zip 17:38 Install Started: 17:43
Repair 7-Zip 17:55 Repair Started: 17:55
Remove 7-Zip 17:56 Remove Started: 17:56
Install 7-Zip 17:58 Install Started: 18:02

5/25/2020 @ 18:09 CET
<< Change CPU Settings to “High++” WorkIdle=100 | SleepIdle=400 >>
Remove 7-Zip 18:09:38 Remove Started: 18:12:05
Install 7-Zip 18:13:29 Install Started: 18:13:30
Repair 7-Zip 18:14:49 Repair Started: 18:16:16

Our production environment is based on 9.5.13.x, we have around 125,000 devices and currently we’re only using OS patching feature and compliance. Currently working on Software distribution and we’re expecting anywhere from 90-120 applications in our environment.

Any ideas/suggestions are welcome, if HCL Support comes back with answers i will post these here also for anybody else that’s facing similar issues.

Thx.

JasonWalker · May 26, 2020, 6:15am

Where I’ve seen customers do similar, the key to responsiveness has been to create a new Action to perform the installation, rather than changing a file or registry key to trigger a Policy Action for the installation.

The reason for that lies not in the “client responsiveness”, but in the client evaluation loop.

The BigFix client uses an Evaluation Loop to constantly re-check all content for applicability. Each Fixlet, Open Action, Computer Group, etc. has its content re-evaluated as part of the loop.

New content, such as a newly-created Fixlet, Action, etc. takes priority and is moved to the “front of the loop”. Once the initial evaluation is complete, this content takes its normal place in the loop.

You can check how long your client’s loop is, on average, by checking average duration of evaluationcycle of client . This returns the average time of the last ten cycles. Times between fifteen and thirty minutes are entirely normal; I’ve seen upwards of two hours, depending on the CPU throttling and how much content is enabled in the deployment (activated Analyses, subscribed Sites, etc.)

So, if you have a file or registry key that makes your Policy Action for the installation Relevant, you can expect the client to evaluate that very quickly - the first time. But when you change that file/value later, you have to wait for that fixlet to come back into context in the evaluation loop before the action could be triggered again.

Your best option is to have your self-service store create a new Action on-the-fly, likely using the REST API. Since this is a “New Action”, it will take priority in the evaluation loop and you can get started in seconds.

Your other option is to try to reduce the time of the client evaluation loop. Increasing the CPU throttle value is a start, but you will likely make much more headway by reducing the total number of Actions, reducing Analysis Activations, reducing the number of Subscribed Sites, and improving the efficiency of any custom properties and relevance clauses.

dgendera · May 26, 2020, 6:42am

Jason, Thx for the quick reply. Indeed I’ve tried also using Rest API to create actions on the fly and that does work and provides immediate response, problem with that is the volume of calls we would have in our environment. We did do analysis for the last 4 months in our current managed environment and we have avg. 14,000 requests/day for install/uninstall/repair of software, this would be stressful for the BigFix system and that’s why we did not want to go that route but use 1 action/application which was also the advise of HCL Professional services which is helping us with the setup/deployment of BigFix for this customer. Would you have any experience what impact would be on our system with this volume of REST API calls? The other issue I see is it will introduce complexity as we also have high number of people working from home so for REST API to work we need to expose this over the Internet.

Below info about actions/analysis/sites

Actions 40
Analysis 16
External Sites 28
Avg Cycle Impact 42 minutes I see some at 2 hours and others at 2-3 minutes

Thx again for feedback

JasonWalker · May 26, 2020, 1:55pm

That scale does sound like you would need to take care. How the server would respond to that volume depends a lot on your hardware scale and tuning, but there’s certainly some potential for trouble.

Our out-of-box solution would be to use the Self-Servic App at the client to accept Offers for those packages the user wants, but I expect you have some business logic built around your custom app store that you need to preserve?

JonL · May 26, 2020, 8:50pm

Other items to review include size and relevance efficiency of your actionsite and any custom analyses, fixlets, and baselines. For example, I’ve caught console operators who created really huge baselines that resulted in multi-MB fixlets and action-site items which clients would take forever to evaluate. Other times grossly inefficient WMI queries or relevance was the culprit. When properly cleaned up, I’ve seen evaluation cycle of client that Jason Walker mentioned, drop significantly. This can take some detective work.