Killing unresponsive besclient process

But process is besclient itself so killing BESClient is not a good option if its being executed on 30K machines, and even if we implement client setting that will gonna treat all client in same umbrella, also if we are changing value for the client itself like 10 min for time being that also will impact any other action which is being executed on the machine.

Yes, this will not be a good option.

so for now, I guess i have to execute each script separately. and putting client setting for like 6 hour or something.

I think you may be missing a point hereā€¦all of this timeout business is to recover from an error. Your commands should not be triggering the timeout at all; if they are, you are in an error condition. You should be checking why your script or command is not completing, and go fix that.

The timeout values here are just to give the BESClient a chance to recover, after your commands have caused an error and are failing to complete and terminate normally.

1 Like

Just one more thing to point outā€¦BESClient is not the process that needs to be killed, what we are terminating is whatever program you executed with the ā€˜waitā€™ command. Given youā€™ve suggested this is mostly Linux, that is probably a /bin/bash or /bin/sh or something along those lines. Find the one where the parent process is BESClient.

1 Like

Thanks @JasonWalker I got the point, our scripts do have auto kill timeout setting but still in some cases there are endless besclient process occurred, so being taking extra control from BigFix I want to put these timeout things.

that also I get it but when we are querying linux/unix devices, its BESClient only which is running longer, and the script which we have executed has no traces to kill other than Restarting besclient service.

What does that mean exactly? What are the symptoms?

The besclient should only hang mid action if there is a child process that is still running. If not then that is very strange.

That said, the besclient itself runs forever at ~2% CPU doing background evaluation unless you enable power saving mode. If it is the 2% CPU usage you are talking about then it has nothing to do with timeouts or hung actions. You just need to enable client power save which will drop the background evaluation to 0% cpu if nothing is changing.

Below is the snapshot of zombie process of BESClient on linux machine. These all were due to endless script execution even script itself has auto cut off within 3 min.

image

Iā€™m not as familiar with defunct process on UNIX, butā€¦what you are seeing is definitely not normal, and I think it is likely that you are creating this condition with whatever it is you have built in to your script to do this 3-minute timeout.

1 Like

Yeah, there may be some odd underlying issue here, but it is hard to know what it is without knowing what all has run on the system that might cause it. Is there an example you can share?

In some cases it could be as simple as how shells are invoked and scripts are run within them, and if the script is opening other shells.

The BigFix agent itself is designed to be single threaded in nature and only have 1 process, the exception being what you tell it to run during an action. Do you have actions that make use of the run command instead of the wait command?