Wait command is executed on OS level but Agent never realizes

rsanchez · July 12, 2018, 6:37pm

Hey there.

I have been experiencing an issue on several agents, where I can see that when I launch OS commands as for instance:
wait /bin/sh -c “echo Output of my script >> /var/opt/BESClient/mylog.log” the command itself is executed and I can see that the logfile contains what I inserted using the echo, but the agent never receives back any code, so the action never continues, as the agents keeps waiting forever, and what is even worst, it seems that such issue keeps the agent from processing any other activity, even contacting to the BES Server, so eventually those agents appear in the console as offline.

I read some threads in the forum where they suffered in the past something similar, but in their case, the issue was that the command did never run in the OS although they were receiving a valid exit code. My issue is exactly the opposite, the agent launchs the command and it run without issues within the OS, but then I do not receive ever any exit code, so the agent keeps waiting forever.

Anyone else has experienced something like this before?

Thank you so much and regards.

Aram · July 13, 2018, 12:15pm

If you run /bin/sh -c "echo Outout of my script >> /var/opt/BESClient/mylog.log" from the shell, does it return to the shell?

In the meantime, something that may help is to leverage the timeout_seconds option of the override actionscript command to prevent the action from impacting the Client: https://developer.bigfix.com/action-script/reference/execution/override.html

rsanchez · July 17, 2018, 7:41am

Hello.

Yes, the command is executed on OS level, and we can see the content of the file, as it would be expected, but, when this behavior happens, the agent does not continue with its tasks, and it is like it waits forever wiating for the exit code. We know that the agent does not hung, because you can test it, check the service, stop it, without issues, you do not need to kill it, it is just that seems to be waiting forever, when that happens you never see any other activity in the log again, until the agent is restarted.
The issue does not happen consistently, but it always happens in wait commands with redirections, even if they are so simple as what I wrote in the example.
We had thought already of using the timeout override actionscript, and we are still considering it, but we would like to understand what is happening before making any working around that might create issues in the normal behavior of the fixlets.

JasonWalker · July 17, 2018, 12:06pm

I think I may have seen something like that if the output logfile is locked by another process. The shell was actually stuck in an error message. See whether you can reproduce that case.

rsanchez · July 17, 2018, 12:27pm

Hello Jason, but in that case, I understand that the file should not be updated. And probably a process with the shell should be alive if you check the OS, but in our case, that is not the case. We can see that the files are always filled, and we have never found the sh process on the OS. It feels like everything works perfectly, but the agent never realizes it.

AlanM · July 17, 2018, 10:53pm

Is this literally what you have in the command? The echo command does require “” or ‘’ around its output so perhaps that is “ending” the quote and the redirect is causing the problem as you can’t do that on our wait command.

rsanchez · July 19, 2018, 12:02pm

Hello Alan, in fact no, that would be somehow the last transformation of the command. The real command would be more like follows:

wait /bin/sh -c “echo {(parameter “information”)} >> {(parameter “outputlog”)}”

Sorry for the mistake in the description of the issue.