For window we can understand if found process kill it after certain time but for Linux/Unix if we are running any script, in such cases BESClient is just a carrier & zombie process are being generated on BESClient itself.
How to deal with such situation? script have cut over to kill itself but still besclient due to some reason dont stop & keep on running for N numbers of hour so how we can put something to kill besclient zombie process.
Thanks @JasonWalker, its something new but we canât set or use it for all machines, it could be possible there are multiple scripts which is being executed on multiple servers & each script execution is different from each other hence setting client setting & binding clients with one specific time frame can create issues.
I guess putting something in action script for that specific action/script that would be much useful than this client setting.
You should set the timeout for ALL machines to some large value, like 2 hours as a failsafe. It is true you wouldnât want to set the timeout for ALL machines to some small value, but pick what you think should be safe enough for all cases and set it to that. at least set it to SOMETHING.
It will only kill the cmd.exe launched by the action. Are you certain that the cmd.exe you are seeing is not another instance of cmd? Use Process Monitor (from www.microsoft.com/sysinternals ) to display the parent process of cmd.exe.
I would NOT assume that the fixlet debugger works properly with all override commands and other nuances. It is really for testing the basics. The client execution environment will always be a bit different.
This timeout should not depend on how long the action takes to execute, but how long a specific wait or run command takes to execute. You could have a baseline or an action that takes 5 hours to run for some unknown reason, but as long as no single command takes more than 2 hours to run within that execution, then the timeout should NOT be triggered.
This is again why I emphasize, that the timeout should always be set to something, it is just a matter of how long. 1 hour, 2 hours, 6 hours, SOMETHING.
Generally if you are going to run a background long running process with BigFix, then it is best to trigger it with a ârunâ command and let BigFix move on to other actions. (you have to be careful with this kind of thing, not run it in the __Download folder, etcâŚ) So even in cases where you might run a background AV scan, I would highly recommend kicking it off with bigfix, then gather the results later with a separate action, or just analysis.
how to use override wait with multiple script execution within same action, each script has its own process to execute and I want to set different override wait for each one of them, I cant separate or bind them into multiple task.
it should not proceed. It assumes that a hung process that hits the timeout is a hard failure.
If you want the commands to run fully independently, then they should be broken up into separate actions. If the commands are NOT independent, then the hard failure on timeout is CORRECT, and you should instead use separate actions with relevance to detect the previous step has completed successfully. There is an option to, at timeout, allow the process to continue to run instead of terminating it, but then you get into a state in which you have orphaned processes running forever that you should clean up, BUT you could clean them up manually after the timeout subsequently in the action, but that is messy.
In general, bigfix is best if you can break up things into as many individual fixlets/tasks as possible, with relevance to detect success or failure of each step independently. It requires writing more relevance, but it ends up giving much better feedback over time of the actual state of things, especially if you want to FORCE a configuration, but the configuration actually has many sub parts.
Trying to think of a good published example of this.
Just to add to that, if you have a condition you expect could sometimes hang and want to handle it yourself, you could use ârunâ instead of âwaitâ. Something like
parameter "waittime"="{now}"
run c:\someprocess.exe
pause while {exists running process "someprocess.exe" AND (now - parameter "waittime" of action as time < 10 * minute)}
if {exists running process "someprocess.exe"}
waithidden taskkill.exe /I'm some process.exe
//Do other stuff
Endif
The execution timeout is to give the client a way to recover if a wait process never terminates. Otherwise the client would stay stuck and not process other actions.
For windows this will work but i am running through multiple shell scripts, in that way besclient is just carrier and zombie process is under besclient name, how to deal with that.
Use the equivalent in bash (pkill? instead of taskkill), or this:
I am not certain if this would work, but you could capture the current value of _BESClient_ActionManager_OverrideTimeoutSeconds at the top of the action, or a default value of 2 hours if unset, then set it to something lower, like 240 seconds after that, then set it back to the previous value or 2 hours at the end.
The only issue is that the client might not pick up the changes immediately on change, so I donât know how reliable that would be, plus if you set it too low, it triggers, then not get set back to the higher value at the end because the action stops, which would be a pain.
There was another forum thread about all this before this new setting was available all about killing all child processes of BESClient PID.