Relays / Agent Not reported with current date and time in console?

Hi All,

We are facing some issues like all of our relay servers and main IEM server are not reporting to Console with current date and time . After checking the logs we are seeing the following errors
Error posting report to: ‘http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe’ (General transport failure.
http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe’ http failure code 503)
At 12:14:06 +0530 -
ForceRefresh command received. Version difference, gathering action site.

Server is reaching internet properly, i am able to ping all the relay servers , communication is happening prorperly but still no luck.

Thanks
Mayank

Hello Mayank,

The error above indicates that the Client is not able to post to itself, and the 503 suggests that the reason may be due to the buffer directory being full. If the Root Server’s FillDB buffer directory is full, it can affect the ability of downstream Relays (and Clients) to report in consistently. Please see http://www-01.ibm.com/support/docview.wss?uid=swg21595524 for additional information, including how to turn on performance logging that might help provide more context.

1 Like

As @Aram pointed out, it seems to be a problem of overloaded infra. Check if the relays and the BigFix server are writing data to buffer directory.

Hi Aram

Thank you for such a wonderful piece of information it works .
But bufferdir is getting overloaded every sec , it means it is writing files to db but getting overloaded again and again.

any idea how to increase its size
I have done some registry changes to increase the size
pfb


but size is again showing the old one
Please let me know how to make bufferdir stable and how to increase the size of it.

Thanks
Mayank

1 Like

I don’t recommend increasing the bufferdir’s size or count as the initial step as this can mask the issue, or make it worse. The Root Server must be able to process data as fast as its coming in (or ideally faster). To this point, I’d recommend enabling FillDB’s performance logging to gauge the data insertion rate (which might also help identify if there are periodic blocks to writing into the database).

Hi Aram,
Clients are reporting but relays and root server is not reporting .

why is it so??

relay is throwing the error like : Temporarily rejection registration: parent is busy backing off

For the reasons I am suggesting above (i.e. the bufferdir on the Root Server - and likely Relays - is often full). What’s likely happening here is that the Root Server and Relays are having to contend with the other Clients that are also attempting to post reports, and so their likelihood of successfully posting a report is reduced (i.e. when they try to post a report, the local bufferdir is already full with other Client reports, and they are therefore unable to do so). We have to address the root cause (which is the full bufferdir on the Root Server…most likely caused by either poor insertion rates for the amount of incoming data, or periodic blocks preventing data insertion). The FillDB performance log can provide more useful data.

If you haven’t done so already, I’d recommend opening a PMR on this issue.

Opening a PMR is the best route but here is some other info to add to the information Aram has already provided that may be of use.

  1. Too many clients reporting directly to the main server instead of relays will impact the speed the main server can commit the data to the DB, and can lead to a build up of the bufferdir. You can use the deployment health checks dashboard (look in the BES Server Health) to check this.
  2. Having lots of clients with low report interval and/or lots properties that refresh every report, eg every 15 seconds, can cause a lot of client data that can contribute to a build up in the bufferdir.
  3. We had a case where we had 2 custom fixlet tripping over themselves which generated report data every time the action ran and lead to overloading the bufferdir.
  4. If the application server and DB are on the same system and same disk, there could be an IO bottleneck.

The PMR would help with you isolating the root cause and taking suitable steps to correct whatever is causing your issue. :slight_smile:

Regs
Rob

2 Likes

Aram,

Please find the performance logs :

Thu, 02 Jun 2016 19:10:24 +0530 – 6684 –
GetReservedPropertyMap: 143 ms

Thu, 02 Jun 2016 19:10:24 +0530 – 6684 – New Database
Boost Level values are: mergeEnabled = 0; transactionsEnabled = 0; maxBatchRate
= 500

Thu, 02 Jun 2016 19:10:24 +0530 – 6684 –
GetAnalysisPropertyMappings: 409 ms

Thu, 02 Jun 2016 19:11:28 +0530 – 6684 –
GetStatisticalPropertyIDs: 64237 ms

Thu, 02 Jun 2016 19:11:33 +0530 – 6684 –
GetNonReportingFixlets: 4349 ms

Thu, 02 Jun 2016 19:11:34 +0530 – 6684 –
GetComputersWithCertificates: 1003 ms

Thu, 02 Jun 2016 19:11:34 +0530 – 6684 – UpdateMappings
complete in: 70001 ms

Thu, 02 Jun 2016 19:11:34 +0530 – 6684 – Parsing: 108
messages in 434 ms: 248 messages/sec

Thu, 02 Jun 2016 19:11:34 +0530 – 6684 – computer
sequences: 87 rows in 346 ms: 251 rows/sec

Thu, 02 Jun 2016 19:11:34 +0530 – 6684 – computer relay
statuses: 14 rows in 65 ms: 215 rows/sec

Thu, 02 Jun 2016 19:13:49 +0530 – 6684 – Fixlet results:
30135 rows in 134996 ms: 223 rows/sec

Thu, 02 Jun 2016 19:13:51 +0530 – 6684 – computer
administrators: 15 rows in 1212 ms: 12 rows/sec

Thu, 02 Jun 2016 19:13:51 +0530 – 6684 – computer roles:
15 rows in 1 ms: 15000 rows/sec

Thu, 02 Jun 2016 19:13:52 +0530 – 6684 – long property
results: 94 rows in 1391 ms: 67 rows/sec

Thu, 02 Jun 2016 19:14:42 +0530 – 6684 – short property
results: 10629 rows in 50029 ms: 212 rows/sec

Thu, 02 Jun 2016 19:14:43 +0530 – 6684 – action results:
62 rows in 258 ms: 240 rows/sec

Thu, 02 Jun 2016 19:14:43 +0530 – 6684 – ----------- Batch
Complete: 108 messages in 259010 ms: 0 messages/sec

Thu, 02 Jun 2016 19:14:43 +0530 – 6684

87.04% full reports

Thu, 02 Jun 2016 19:14:43 +0530 – 6684 –
GetComputerSequences: 39 ms

Thu, 02 Jun 2016 19:14:43 +0530 – 6684 –
DeleteUnusedSites: 48 ms

Thu, 02 Jun 2016 19:14:43 +0530 – 6684 –
TimewiseAggregateStatistics: 172 ms

Thu, 02 Jun 2016 19:14:53 +0530 – 6684 –
GetAnalysisPropertyMappings: 75 ms

Thu, 02 Jun 2016 19:14:53 +0530 – 6684 –
GetStatisticalPropertyIDs: 97 ms

Thu, 02 Jun 2016 19:14:53 +0530 – 6684 –
GetNonReportingFixlets: 71 ms

Thu, 02 Jun 2016 19:14:54 +0530 – 6684 –
GetComputersWithCertificates: 979 ms

Thu, 02 Jun 2016 19:14:54 +0530 – 6684 – UpdateMappings
complete in: 1227 ms

Thu, 02 Jun 2016 19:14:54 +0530 – 6684 – Parsing: 209
messages in 359 ms: 582 messages/sec

Thu, 02 Jun 2016 19:14:55 +0530 – 6684 – computer
sequences: 186 rows in 760 ms: 244 rows/sec

Thu, 02 Jun 2016 19:14:55 +0530 – 6684 – computer relay
statuses: 11 rows in 58 ms: 189 rows/sec

Thu, 02 Jun 2016 19:16:11 +0530 – 6684 – Fixlet results:
16837 rows in 76043 ms: 221 rows/sec

Thu, 02 Jun 2016 19:16:12 +0530 – 6684 – computer
administrators: 10 rows in 859 ms: 11 rows/sec

Thu, 02 Jun 2016 19:16:12 +0530 – 6684 – computer roles:
10 rows in 12 ms: 833 rows/sec

Thu, 02 Jun 2016 19:16:13 +0530 – 6684 – long property
results: 79 rows in 838 ms: 94 rows/sec

Thu, 02 Jun 2016 19:16:50 +0530 – 6684 – short property
results: 8106 rows in 37285 ms: 217 rows/sec

Thu, 02 Jun 2016 19:16:51 +0530 – 6684 – action results:
53 rows in 224 ms: 236 rows/sec

Thu, 02 Jun 2016 19:16:51 +0530 – 6684 – ----------- Batch
Complete: 209 messages in 117794 ms: 1 messages/sec

Thu, 02 Jun 2016 19:16:51 +0530 – 6684

94.74% full reports

Thu, 02 Jun 2016 19:16:51 +0530 – 6684 –
GetComputerSequences: 36 ms

Thu, 02 Jun 2016 19:16:51 +0530 – 6684 –
DeleteUnusedSites: 45 ms

Thu, 02 Jun 2016 19:17:01 +0530 – 6684 –
GetAnalysisPropertyMappings: 79 ms

Thu, 02 Jun 2016 19:17:01 +0530 – 6684 –
GetStatisticalPropertyIDs: 88 ms

Thu, 02 Jun 2016 19:17:01 +0530 – 6684 – GetNonReportingFixlets:
68 ms

Thu, 02 Jun 2016 19:17:02 +0530 – 6684 –
GetComputersWithCertificates: 1058 ms

Thu, 02 Jun 2016 19:17:02 +0530 – 6684 – UpdateMappings
complete in: 1296 ms

Thu, 02 Jun 2016 19:17:04 +0530 – 6684 – Parsing: 429
messages in 1450 ms: 295 messages/sec

Thu, 02 Jun 2016 19:17:05 +0530 – 6684 – computer
sequences: 275 rows in 1155 ms: 238 rows/sec

Thu, 02 Jun 2016 19:17:05 +0530 – 6684 – bad report
numbers: 1 rows in 3 ms: 333 rows/sec

Thu, 02 Jun 2016 19:17:05 +0530 – 6684 – computer relay
statuses: 187 rows in 805 ms: 232 rows/sec

Please suggest

As Aram has recommended - you really need to raise a PMR to get support assist you.

Please do open a PMR on this for further support. We’d usually recommend leaving the performance logging enabled for some time (at least a few hours) to get a good number of data samples.

That said, without more info some of the insertion rates above are not very good (in particular Fixlet Results and Short Property Results). The following links may provide some more information/context:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Endpoint%20Manager/page/Troubleshooting%20FillDB%20Operations

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Endpoint%20Manager/page/Server%20Disk%20Performance