Do the “Wait” or “Waithidden” commands have a time to live?
For example, one of the AIX Take Action options allows you to apply the patch from an NFS share. The Actionscript then contains
// Mount NFS Source
wait sh -c “/usr/sbin/mount -o ro {parameter “NFSSource” of action} “{parameter “NFSDir” of action}””
// If the mount failed, perform cleanup tasks and quit action
However, I am concerned that the mount command may just hang (instead of failing). When this occurs outside of BigFix I have to manually kill the mount process and start again.
So, basically, my questions is How does the “wait” command handle a hanging process?
Wait will wait until the command returns – no timeout.
You can use some shell script to force a timeout if you’d like but unfortunately you wont find a mechanism in BigFix to do it.
You can however use just run or runhidden and then do a loop where the shell sleeps checking if the mount has completed every couple of seconds (and maybe retrying it if it failed).
That’s what I was afraid of
In a perfect world I won’t have any process hanging…but to avoid having to edit the default ActionScript of the ‘Take Action’ for out of the Box fixlets, I will just have to monitor the progress of my fixlets, and if any one is taking longer than expected, I can stop it and try again.
Just to also confirm my next suspicion.
When I click ‘Stop’ on an action…it will only stop actions on endpoints where they have not already started. It will not stop any currently running action.
So if the process is hanging, I will have to ssh into the box and kill that hung process.
Unfortunately there are all kinds of different “things that can go wrong” when deploying actions. I have Java upgrades hang on a fairly regular basis, and a few of the Office updates as well. It would certainly be nice to have a client setting for something like “maximum duration of wait command”. As it stands now, the fact that Actions are run in serial means that you cannot use the BES client to fix itself if a ‘wait’ command is hung. We have to resort to outside watchdogs, such as a Scheduled Task to monitor the BES Client log and restart the BES Client service if it looks like things have gone awry.
I think it would be very useful to be able to set a high value at the start of a Baseline containing something like a Service Pack or OS upgrade, and then maybe turn it down to an hour or so for “normal” actions. I’m pretty sure I’ve put in an RFE for that but I haven’t seen a lot of feedback.
I’ve been burned a few times on Windows Systems when an application that is supposed to be “Silent”, will display a dialog box and leave the Client unable to continue.
I’d love to be able to configure a Timeout as well. If you remember the RFE number I’d go vote for it!
The issue with just erroring out of the fixlet (or continuing the fixlet) is it is very likely there is a file in use on the system now (potentially an executable in the download folder) and the client wont be able to continue anyway.
The only way this would really work would be to kill the process launched by waithidden.
It seems like instead of a global setting it should be a setting of the action itself
I’d be happiest with the client killing the process it started after a timeout period.
This should result in a Failure for the Action.
The timeout could be client based or action/command based. I would think a default of 30 minutes would be a good start, but it would depend on the product being installed.
The biggest problem I have is that we have to use an external system to monitor the client. If the BES Client would unblock, we could potentially have other Actions perform whatever cleanup we need.
It would be best if it performed some kind of cleanup. I think in the case of a timeout, at the very least the action should Fail. Additional options might include killing the process tree of the executable that was called; killing any process that has a file handle open under __Download; and setting a registry entry to flag that an action has been killed (similar to “pending restart” we might have an inspector for “action timeout”)
You can institute a timeout in actionscript now without any need for an RFE, but it is a manual process and not as easy as it should be.
It requires you to use the RUN command instead of the WAIT command, then use a PAUSE WHILE statement with a timeout immediately after. Then if the timer expires and the command is still running, you can kill it in the actionscript.
Something like this: (this is not exact)
run program.exe
parameter "start_1"="{now}"
pause while { (now-time (parameter "start_1") < 360*second) AND exists running process whose(name of it = "program.exe") }
if { exists running process whose(name of it = "program.exe") }
wait taskkill /F /T /IM program.exe
endif
I like the RFE idea (which I’ve already voted for), as it prevents having to edit the default actionscripts of BigFix tasks/fixlets… Bodge jobs never get the appreciation they deserve, and when something goes wrong it’s inevitably the bodge that takes the blame.
The global settings are intended to allow the client to ‘eventually’ start responding again in case an action spawned with wait or waithidden are never going to complete, for instance if they are trying to display a message that can not be acknowledged because the process does not have a desktop window. Ideally this value should be set at an hour or higher, and even longer when doing long-running operations such as an OS reinstall.