Clients Sending Full Reports causing FillDB to be backlogged constantly

Hello Everyone,

I’m new to the forum’s and am seeking some assistance from anyone that might have experienced a similar issue.

As the title says, a majority of my clients are sending Full Reports all the time instead of just reports on what properties / analysis’ have changed.

I have FiilDB Performance Logging enabled and have confirmed this full report sending via the following lines:

Wed, 02 May 2018 23:07:22 -0400 -- 1756 -- ----------- Batch Complete: 1202 messages in 77097 ms: 15 messages/sec
Wed, 02 May 2018 23:07:22 -0400 -- 1756 --                             93.01% full reports

If this were to occur once in a while it wouldn’t be a major issue but as this example is one of the lower percentages its a constant issue in my environment of 50000 devices.

I have already checked the Application Server audit logs and see no abuse of the “Send Refresh” button.

So I’m here to ask if anyone has had a similar issue in the past or current that could give me some settings to check or log entries to look for, really any and all help would be appreciated.

Some overall information:

  • All clients are on the open internet, no VPN / LAN connectivity, clients talk to a relay in the cloud and that relay talks to our app server
  • Our relays are not perfectly loaded with clients (meaning some relays have over 1000 clients, but they are linux relays with plenty of resources)
  • BigFix version: 9.2.7.53

While this is an outdated version and there are numerous FillDB enhancements in later versions, I still feel that I would be masking an issue by updating as clients shouldn’t ALWAYS send full reports.

How are your relays exposed to internet? I mean you say that clients talk to A relay, but it is not clear if you have many internet relays each one with an internet address or some load balancing to you relays on a single internet address. The last case may cause the effect you see depending on the load balancing configuration using a simple round robin algorithm rather than some kind of session persistence/reuse same connection.

Clients shouldn’t be sending full reports unless triggered with right click “Send Refresh” option, or with actionscript “Notify client ForceRefresh”

I would look at your actions to see if any of them are using that actionscript.

If clients are deleted from the console, but still reporting, that may also trigger a full refresh.

One condition that forces the clients’ refresh is triggered by the BigFix Server without explicit actions: clients are asked for a full report if there is a reporting sequence error, that means missing one or more reports from a client (like something lost or delayed in the relay chain, but new reports arriving via a different relay path).
This is in general a symptom of unstable relay hierarchy/network.
A message in the client’s log will flag this kind of condition, and the server’s log will show traces of the requests but requires turning on debugging.

Everyone thank-you so much for responding and the input / advice provided. I have been working with IBM support and have found out the root cause of my issue. I will provide as much detail as possible for anyone that searches for a similar issue in the future.

In my current version of BigFix 9.2.7.53, the FillDB Performance Logging is backwards in regards to the percentage of full reports.
So the line:

Wed, 02 May 2018 23:07:22 -0400 – 1756 – 93.01% full reports

really means only 6.99% were full reports, not 93.01%. Not as scary of a number when you know that bit of information.

There were still full reports coming in though as determined by the above line.

IBM L3 support confirms that once a week a client will send in a full report, so a small number of full reports is not a bad thing and is to be expected. Depending on the size of your BigFix implementation, that small number may not actually be a small number.

For this particular issue I was experiencing, the number of full reports was too high to be just the weekly’s coming in. The root cause of my issue was actually caused by a 3rd party application (Application that is in no way related to BigFix) that was writing a log file into one of the BigFix client site directories.
In order to determine if this is happening to your client, we look at the client log. The below lines are what was seen in the BigFix client log:

Retry error, attempt 9 failed for ForceNonexistence (C:\Program Files\BigFix Enterprise\BES Client__BESData\CUSTOMSITE*debug.log*)
Gather::SyncSiteByFile caught FileIOError (5) FileIOError
At 00:10:46 +0200 - CUSTOMSITE (BigFixRootServer/cgi-bin/bfgather.exe/CUSTOMSITE)
FAILED to Synchronize - Site data corrupted. - gather url - Relay:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=BigFixRootServer:52311/cgi-bin/bfgather.exe/CUSTOMSITE&Time=04May00:10:40&rand=a4931e4e&ManyVersionSha1=f1b0d1f6ed9233f3b89d0120865ab95053c7

If you continue looking in the logs, you will notice the following line:

At 18:07:46 +0200 -
Full Report posted successfully

So from the entries above, you can tell that there is a foreign file within the BigFix application folders, that is unable to deleted, so it fails to synchronize the custom site and thus sends in a full report. We had this happening on a few thousand devices.

It may seem obvious to some, but keep in mind that this effects only the site referenced above. So for example if your able to find the third party application causing your issue, you can send a task with BigFix to fix BigFix. Only catch, is you can’t use the same site that is broken. So in our case we used the Master Action Site to send a taskkill to the faulting application, after that our number of full reports dropped drastically, as if the file is not “locked” BigFix will automatically delete, add, update, etc the files within its application directories.

I might be able to dig up more information on this to help those that face a similar issue, but this should be a good start for anyone else experiencing a similar issue.

I have been on these forums a lot as a result of googling things, just never a member until now. It seems like a great community and I hope I can help someone else as everyone on here has already helped me!

Thanks!

5 Likes

Thanks for the update…
I wonder why an external application is logging to BigFix data directories and locking them: was it maybe distributed and (mis)configured via BigFix because of some defaults?

We are currently in the process of figuring that out. I believe the debug.log file is writing to a BigFix directory if the application is started natively by BigFix actionscript.

For example:

run “Full Application Path”
This seems to cause the file to appear in BigFix directories, more specifically the directory for the site that the action was taken from.

This however:

createfile until end
@ECHO OFF
cd “ApplicationDirectory”
Application.exe
end

move __createfile run.bat

override run
runas=currentuser
hidden=true
completion=none
run run.bat
Does not seem to create the file. So I think its something with how the application is written that it will write the log to its starting / working directory. I’m currently working with the developers of the third party application to find out 100%.

Good piece of information to know. I’ve never seen this in the documentation before.