Have client post Report faster?

cstoneba · October 21, 2015, 5:16pm

I’m executing tasks in IEM via the REST API and then polling the status of the action every 30 seconds (also via the REST API). While testing I’m seeing that after the script execution in the action completes (with an Exit Code =0) it can take up to 90 seconds for the client to “Report posted successfully”. This 90 second delay is causing the action status to remain at Running even though the execution has really already completed which slows down the automation calling IEM.

Is there a way for the report to be posted sooner?

jgstew · October 21, 2015, 5:22pm

It is not just the amount of time it takes for the report to be sent from the client to the relay, but then also the time it takes for the relay to pass it up the chain to all other relays above it, then to the root, then the root to ingest it.

You can set the minimum report time to be more aggressive, but this is a bad idea to do to all clients if there is a very large number of them in your environment. The clients will report on the interval nearly all the time, which puts more strain on your relays and root server.

Why are you polling the status of the action at all? What is the use case?

cstoneba · October 21, 2015, 5:33pm

An external workflow is calling the IEM task to execute and then polling the status of the action every 30 seconds to know when it completes successfully so that the workflow can move onto the next steps.

Yup, i understand that the relays, root server, etc are in the mix for the post timings, but I don’t think they effect how long it takes for the client to actually evaluate and post the results upstream, right? What is the client waiting for to post the results?

I wish there was a client setting that could be set to post the results faster. Maybe increasing BES Client cpu usage would help, but I’d like to stay away from that for now.

jgstew · October 21, 2015, 5:41pm

You don’t need to wait until the first step in the workflow is completed if the next step in the workflow is happening on the same endpoint and you can write relevance to detect when the first step has completed so that the second step doesn’t start until that point. If you can do this, you can deploy both steps at the same time and the client will execute them as fast as possible without the need to wait in between. This same concept can be applied to an unlimited number of actions on the same endpoint, chaining them together as needed.

This is why I asked what your use case is, because all of this polling and waiting around may not be needed at all.

There are settings to make it report faster. _BESClient_Report_MinimumInterval sets the minimum time between reports. Make it lower on a particular client, and it MAY report more often if it can.

You are correct that the CPU usage of the client is involved, but it is automatically higher during an action executing. Upping it would help mostly when an action isn’t running.

cstoneba · October 21, 2015, 5:50pm

Hi, to clarify, the 2nd step in the workflow isn’t executing against IEM, it is just waiting for IEM to finish. The 3rd step in the workflow can only start after the IEM Action was a success (or after a period of time and a timeout is recorded).

REST API call to IEM to execute task
Poll IEM action status for completion
Proceed with next steps in workflow (not IEM related)

jgstew · October 21, 2015, 6:29pm

Read about the setting here: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Endpoint%20Manager/page/Configuration%20Settings?section=_BESClient_Report_MinimumInterval

gearoid · October 21, 2015, 8:49pm

If you’re using a relevance query you’ll also have the fact that this goes to web reports and there’s a refresh interval in web reports.

jgstew · October 21, 2015, 8:53pm

This is a very good point.

How long does it normally take for this @cstoneba ? What is the current speed that you’d like to see be faster?

The issue may have more to do with WebReports than the client report interval.

This is part of why I’d recommend having a WebReports server just for the REST API with a faster refresh.

etorres · October 21, 2015, 10:25pm

One thing to consider is the number of properties the endpoints are evaluating at each report as well as the computational cost for each of them. When a device needs to send up its action status it triggers the reporting loop where it will processes all properties that need to be evaluated at that time (I believe the action is paused until this completes). If you have many properties reporting at low intervals (i.e. every report, every 5 minutes, etc) they will add to the delay. I have seen computers take 2 minutes between action status updates due to numerous or poorly optimized properties.

Another thing, as @jgstew mentioned, is that results need to go up the relay chain until it reaches the root server. From my observation, each relay will hold onto a report for a second or two before sending it up to its parent. The reason for this is so that it can combine reports from multiple devices into one smaller [compressed] bundle (low bandwidth site optimization). If your clients are 4 relays away from the root server you might be spending ~4-8 seconds just between the relays and the root server.

Once the report arrives at the root server, it is up to FillDB to import the data into the database. Another observation I had is that FillDB will wait until at least 1,000 reports are ready to be processed or 10 seconds elapsed since the last batch import. If the report you’re waiting for came in while the previous batch import was running it would have to wait at least another 10 seconds before it is processed. Depending on the server/database configuration and the number of report changes included in that batch, it may take as little as one second or even 20 or more seconds for the data to be committed to the database (I have seen 45+ second batches due to broad force refreshes being sent to devices, with ~10+ minutes being spent handling the backlog of batches that needed to be processed).

Once FillDB processes the data, /Action/[ID]/Status should reflect the updated information. However, as @gearoid mentioned, if you’re using /query it’ll be bound by your web reports update interval (default is 15 seconds, but may be increased in your environment).

cstoneba · October 22, 2015, 3:46pm

Hi, I realize that the filldb, backend SQL, number of relays between client to Root, etc, all play a part in slow results. But for this I’m looking only at the besclient log file and the time it takes between the “Command completed successfully” line and the “Report posted successfully” line, as this delay slow down all the upstream things pieces mentioned above and that client timing should have nothing to do with any upstream IEM infrastructure.

For checking the action status, we are using the /Action/[ID]/status method so there’s no concern that it is being routed through WebReports.

So @etorres, you’re saying the Evaluation Cycle the client runs has to run through before it can post a report after action execution?

etorres · October 22, 2015, 4:48pm

The last time I checked (it’s been a long while so I can’t say if something changed), any time the computer needs to send up any type of information such as fixlet applicability, changes to property data, and action statuses, it starts its reporting logic and groups all the information and sends it up in one report.

If you want to get a sense as to what it’s doing between the completion and the report posted messages, enable the debug logs through the EMsg client settings, restart the client, then watch the log file you specified for the report logic lines.

Windows: http://www-01.ibm.com/support/docview.wss?uid=swg21505962
Non-Windows: http://www-01.ibm.com/support/docview.wss?uid=swg21506110

The start of reporting logic will be identified with "VerboseMessage Entering Reporting Logic."
Counters containing the elapsed time for each internal process will contain "ReportTimer"
Property evaluations will contain "EvalLog [sitename].[analysis-id]:Evaluate Property [Number]"
The amount of time spent evaluating all properties will be recorded in a line containing "ReportTimer EvaluateProperties"
I believe the exit from the reporting loop is identified by “EvaluationManager::Restart Idle All”

Looking for gaps of time between the property evaluation lines should give you a hint as to where those ~90 seconds are being spent.

If you can’t do anything with those properties to lower the amount of time they take, I think the only other option would be to use the _BESClient_Report_* settings to increase the CPU utilization.

gearoid · October 23, 2015, 3:02pm

You may have properties that are currently set to be evaluated on every report. This may not be required and you could lower the frequency. That can speed up report time.

jgstew · October 23, 2015, 3:44pm

@gearoid makes a very good point about the report frequency of custom properties. In general, they should NEVER be set to every report. I would recommend once every 6 to 12 hours for almost everything. If it is a property that is important to be very fresh, then I wouldn’t set it any lower than once every 15 minutes in most cases.

etorres · October 23, 2015, 8:11pm

In my haste it looks like I forgot to add some information to my post… (whoops)

As @gearoid and @jgstew mentioned, increasing the reporting intervals for custom properties should help. However, built-in properties and those you receive from external sites (e.g. the application, hardware, and operating system information analyses in the BES Inventory and License site) cannot be modified.

So, if custom properties are taking a while to run you can look at increasing the refresh intervals or optimizing the relevance.

While looking at one of my test systems I just realized someone activated the “Installed Windows Patches Information” analysis in the BigFix Labs site and that computer spends 13 seconds on each report to update that information… I’m thinking about making a custom copy to drop it from every report to 6 hours…

jgstew · October 24, 2015, 12:12am

You can also increase the minimum analysis interval which will set the smallest amount of time a property set to “every report” is evaluated. You could make it 5 minutes so that effectively a property set to “every report” and a property set to “5 minutes” will actually have the same report interval.