Linux/UNIX has a handy console command “timeout”, which allows you to execute a process and, if the process does not complete within a certain amount of time, kill the process. This can be used to prevent a command from stopping a script execution if it hangs or tries to provide feedback to a nonexistant user session.
Windows lacks any such command natively and I’ve struggled for an approximation. Here’s a tip I hope is helpful for someone else.
This example simply executes a “notepad.exe”, leaves the process running for 30 seconds, then kills it, but this should be adaptable to other processes when the executable name is known.
runhidden notepad.exe
pause while {exists processes whose (name of it = "notepad.exe" and ppid of it = pid of service "BESClient") AND (now - active start time of action < 30 * second)}
if {exists (ids of processes whose (name of it = "notepad.exe" and ppid of it = pid of service "BESClient")}
waithidden taskkill.exe /pid {(ids of processes whose (name of it = "notepad.exe" and ppid of it = pid of service "BESClient")}
endif
I suppose you could create a scheduled task that runs right away and only runs once, but it isn’t an ideal way to run arbitrary commands with a maximum runtime.
You could use ping, timeout, or sleep to wait. What that doesn’t do is kill any hung child processes started by the Action.
For instance I’ve had cases where wget.exe or Symantec’s smc.exe processes hang, leaving the __Download folder locked so that subsequent actions cannot clear the folder to handle their own downloads. So I needed a way to detect and kill these problem processes.
The advantages of this method is that I know I’m killing the correct instance of wget.exe or smc.exe, because I’m locating the one whose parent PID is the BESClient service; and that I don’t necessarily perform the wait operation - if the child process completes before the timeout value, the Action can continue.
With some Symantec operations, I’ve seen instances where an smc.exe process might complete immediately, or might take up to two minutes before completing successfully. With this method, I could set a timeout of 10 minutes, and if smc.exe completes before that time the Action can complete without further delay.
As far as I know, the method you provide at the top is the only option available to do this on Windows, particularly in a native way without using some 3rd party exe.
Obviously others are misunderstanding the problem and solution. You are setting a maximum threshold on how long an item will be allowed to execute. If it takes less, then fine, the code just moves on… otherwise if it takes too long, then it is killed. You are not setting a minimum wait time, which is much easier to do with a pause while statement without the need to execute anything.
I like this addition:
I’ve never used that before, but it makes a lot of sense. It is the best way to make sure you are closing the process spawned by the BESClient and not another process that a user may be using at the moment. I guess I was just lucky and never hit that situation before.
Just a note for anyone who tries pid of service "name" this inspector didn’t exist until 9.1… If you are running an older version or using an old QnA or Fixlet Debugger it will not work…
Q: pid of processes "BESClient.exe"
A: 3796
T: 5.491 ms
Q: pid of services "BESClient"
A: 3796
T: 0.119 ms
Q: pid of services "BESClient" = pid of processes "BESClient.exe"
A: True
T: 200.542 ms
Note: I had to run the FixletDebugger with PSExec to get this to work
Jason, thanks for providing this snippet. Very useful. A few things to note to others who are using this or have a similar use case.
Don’t attempt to use test this out in the Fixlet Debugger, use a “real” task/fixlet instead. Because while active start time of action in the Fixlet Debugger doesn’t return an error, it does apparently run in the year “10061”. Not sure if that’s way back in time or way in the future.
Once I was testing in a real action, the pause while loop kept evaluating as false quicker than I expected. I soon realized that the “other stuff” I was doing in my action before this was taking longer than 30 seconds from the start of the action. So by the time I got down to this line, 30 seconds had already passed and it didn’t pause.
I realized what I really wanted was to say was OK, run this process and wait X seconds from now. Slightly different variant on Jason’s use case. So I adjusted the code to account for that. An example is below, where I am checking for a file that will be there on completion of some other process. If it’s not there after the timeout I want to fail the action, so I use the continue if statement. This code also works in the Fixlet Debugger if needed.
parameter "TimeToWaitFromNow" = "{now + (30 * second)}"
parameter "CompletionFile" = "C:\Temp\test.txt"
pause while {not exists file (parameter "CompletionFile") and (now < parameter "TimeToWaitFromNow" as time)}
continue if {exists file (parameter "CompletionFile")}
delete "{parameter "CompletionFile"}"
Say a BigFix task is calling a .bat file which is calling a .jar file, which is obviously calling Java…will the PPID of the java.exe process still be the PID of the BESClient service?
No, you’d have to walk back the chain. The PPID of java.exe should be the PID of the cmd.exe, and the PPID of cmd.exe should be the PID of besclient.exe.
@strawgate - yep, that’s what I’m doing… this is for the CIS-CAT batch file (which calls a .jar file, which sometimes just hangs ad infinitum). I’m killing java.exe if it’s still running 15 minutes after the script launches, but the problem is that I’m worried that I may also kill Java if it is running for a legitimate production purpose. So I really need to nail the whole "kill this PID whose PPID = PID of "…
I tried including these 4 lines (the first of which is to give Java time to spawn before checking to see if it exists):
pause while {(now - active start time of action < 90 * second)}
pause while {exists processes whose (name of it = "java.exe" and ppid of it = pid of process "cmd.exe") AND (now - active start time of action < 900 * second)}
if {exists (ids of processes whose (name of it = "java.exe" and ppid of it = pid of process "cmd.exe")}
waithidden taskkill.exe /pid {(ids of processes whose (name of it = "java.exe" and ppid of it = pid of service "BESClient")}
The third line failed, trying to figure out why… the batch file calls a .jar file which calls Java, so not sure what went wrong.
I really need to look more into how to determine the PPID in a quick manner so I can catch all of these while I manually run CIS-CAT and figure out the parent-child relationships.
The issue is probably that Java.exe has a PPID of CMD.exe which has a PPID of the besclient.
So what I’m getting at is if you use Taskkill /T it will kill the whole process tree so if you run taskkill /T on cmd.exe it will kill any child java processes (and anything else) without you having to walk the whole process list trying to discern what belongs to your action and what doesn’t. The Taskkill /t command wont kill any java processes not launched by bigfix.
Something like…
Taskkill /T /pid {ids of processes whose (name of it = “cmd.exe” and ppid of it = pid of service “BESClient”)
No joy; has been running for over 30 minutes so far (when should have been killed at 15); see attachment.For some reason, it’s not exiting the “pause while” clause.