Server Automation Delays

Hey folks, I have intermittent issues with Server Automation. We use this heavily at night, and every now and then, I’ll see the parent xml get submitted via rest API (I see the parent action), but then nothing happens for a few hours, and it blasts every child action that has been “queued” up. I could have three parent XML jobs sitting on the action line, spaced out by an hour, but no child jobs for hours, and then they all start kicking off. Very often, this delay occurs near midnight, and a little after 3am, they will all start running. Nothing in the event viewer on the main server and no known maintenance to my knowledge.

I don’t change any of the plan settings or anything like that, just every now and then there is a large pause (hours) before they all decide to kick off. In the automation logs, I see nothing during that “pause” window. Nothing of note in the automation plan logs or anywhere else that I can think of to find. Has anyone else seen anything like this. I am on v10.0.3.66 currently, but I even saw it back in the 9.x.x versions as well. If anyone has any insight, I would appreciate it. Thanks in advance.

There are several options that could exist, my suggestions:

  • Verify the server’s IO and CPU load at that moment.

  • Verify whether bulk rest API jobs are being triggered or if there are several automation jobs.

  • Verify FillDB; it may be the main reason the console is not updating with a status.

  • Do you use the device name or computer ID when targeting devices with RestAPI? If you use the device name, name resolution will take longer, but computer ID transition will always be quick.

  • Any Endpoint Control operations carried out within that period, as they may result in needless scanning or detection.

The problem is not just the console getting updated, server automation is doing nothing per the application logs. Just nothing for a few hours, then it starts working. All other actions from the console work just fine, its just server automation. I don’t see anything in the event logs where the service has crashed or anything like that. It just kind of does nothing, until at a certain point where it starts to work and runs everything that has been queued up in server automation. I am just calling plans that are already targeted against dynamic groups in the plan.

I would recommend opening a case with HCL product support team.

Related thread: