BigFix Inventory and Directory overload issues

Hi, we have implemented BigFix inventory in our environment on approx. 5000 devices (total environment 140k). We’re running into issue on our Fake Root server where we get following error message “Overloaded buffer directory reached”
the way to resolve this is by change values of below settings
[Software\BigFix\EnterpriseClient\Settings\Client_BESRelay_UploadManager_BufferDirectoryMaxSize]
value = 55368709120

[Software\BigFix\EnterpriseClient\Settings\Client_BESRelay_UploadManager_BufferDirectoryMaxCount]
value = 50000

What is not clear to me on the uploadmanager is the directory being used for it. I’ve not specified that in any settings so does it assume on Linux it’s using below directory
/var/opt/BESRelay/wwwrootbes/bfmirror/downloads/sha1

Checking that directory on my fake root it has 14500 files and is around 790GB in size (cleanup is probably needed and will take care of this in separate project)

to avoid this issue would it make sense to change the directory where uploadmanager stores the files?
_BESRelay_UploadManager_BufferDirectory?

I’m only seeing the issue on my Fake Root server, the regional relays are not showning any errors related to this.

Any advice/suggestions are welcome.
I’ve also opened case with HCL Support on this.

The buffer directory referenced could be one of two things. There’s a buffer directory for client reports, and one specifically for the Upload & Archive Manager. If this is related to Inventory, then it’s like the UploadManager directories that should be examined.

Note - these directories should be changing rapidly; files are added by clients posting archives, and removed from the directory when the archive is processed and moved to the parent relay.

For Uploads specifically check
/var/opt/BESRelay/UploadManagerData
and
/var/opt/BESRelay/UploadManagerData/BufferDir

For Client Reports check
/var/opt/BESRelay/FillDBData/BufferDir/
and
/var/opt/BESRelay/FillDBData/BufferDir/ForwardingBufferDir/

Jason,
Thx for the info and explains a lot. I’m working with HCL Support to understand why the folder “/var/opt/BESRelay/UploadManagerData/BufferDir/sha1/” keeps filling up with files, it currently holds 50000 files most of them are between 3-5Kbytes and total size of the folder is 500MB.

This is our ‘fake’ root server (Linux Relay - Version 10.0.10.46). I’ve checked on our Root server and indeed same folder exists but on that server we have these folders with last 2digits of the BigFix ID and I can find back files that have been send using Archive Manager. We do have clean-up script that will remove files > 60days. Can this also be done on the fake root without causing any issues? I’ve read several articles and documentation but it always mentions that it’s up to the BigFix Administrator to keep this clean and prevent this from overloading and eventually the relay stops accepting requests.

Follow-up Question
Is there a way to find out which clients are sending data to this server? I’ve copied several file to temp location and try to read those but they only contain info about PuttyPuttyPutty and nothing else.

The files have sha256 value owned by Root:Root, and we did see huge increase in the number of files since October 5th. Nothing has changed in our envrionment during that period or before.

THx for help

Those files should be removed when the fake-root relay posts them to the root server.

Check the ages of the oldest files on this directory over several minutes, the oldest files should be getting deleted first. If files are being posted to the root and cleared, but new ones are coming in too quickly, then check connectivity speed and performance of the root server.

If no files are getting removed l, I’d check whether the fake-root is having trouble connecting to the real root server, and ensure the HOSTS file entry is correct so this fake-root can reach the real root using the masthead hostname.

Jason,
I’ve looked at the directory and below are #files for each day
grep -i oct 11 | wc -l|–> 4104
grep -i oct 10 | wc -l|–> 9007
grep -i oct 9 | wc -l|–> 7767
grep -i oct 8 | wc -l|–> 4757
grep -i oct 7 | wc -l|–> 24908
grep -i oct 6 | wc -l|–> 1766
grep -i oct 5 | wc -l|–> 102

The issues started Oct 5 before that only have handful of files that are not processed or send to root server.

When I check for today I see the filecount increase every couple minutes.
I’ve performed following connectivity tests
ping → OK
ssh 29450 → OK
we use customport 29450 for BigFix vs 52311

Connectivity is established no issues.
I’m checking/reading some articles about UploadManager and see if there is anything that could explain this behavior

If the files are under the directory that matches the computer ids of your endpoints, then you could check in your console specifically those computer ids and see what actions are running on those clients, if no actions are visible in the console you can check the besclient logs directly on the machines.

That part confuses me. Attempting to ssh on the BigFix port should fail, as BigFix isn’t talking ‘ssh’ protocol, it’s talking ‘https’. Example:

C:\Users\Administrator>ssh bes-root.domain.home -p 52311
kex_exchange_identification: Connection closed by remote host

A better connectivity test from the relay to a root server or parent relay, is to use the https protocol. Many layer-4 firewalls (Palo Alto) need to allow not only ‘tcp/52311’, but specifically need rules for the ‘https’ application to connect on tcp/52311. To test with curl, you could use a command like the following (substitute in your root server and port number). This connects to the Server/Relay and retrieves the deployment’s version:

C:\Users\Administrator>curl -k https://bes-root.domain.home:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=Version

ClientRegister
Version 11.0.2.125

The fact you have files going back so far makes me think this fake-root may not be successfully connecting to the root server. Check the client setting values for ‘__RelayServer1’ and ‘__RelayServer2’, ensure these are configured to talk to the correct root server, and check the BESClient and BESRelay logs to see whether it is connecting successfully, along with the connectivity test above.

What version is your root server, and what version the fake-root? If these are at BigFix 11, you might also need to check that any firewall between supports the TLS 1.3 traffic. Also on the root server use BESAdmin to check whether Enhanced Security is enabled, and whether ‘Require TLS 1.2’ or ‘Require TLS 1.3’ is enabled, as those could have affects on firewall connections as well.

You can also use curl parameters to explicitly check connections with tlsv1.2 and tlsv1.3:

C:\Users\Administrator>curl -k https://bes-root.domain.home:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=Version --tlsv1.2 --tls-max 1.2

ClientRegister
Version 11.0.2.125

C:\Users\Administrator>curl -k https://bes-root.domain.home:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=Version --tlsv1.3 --tls-max 1.3

ClientRegister
Version 11.0.2.125

Please try those and report back.

Edit: Another part confuses me -

In recent versions of BigFix, we have updated the ‘cleanup’ tools available in the BESAdmin tool to allow removing uploads for computers that have been removed from the deployment. It should not be necessary to manually cleanup the UploadManagerData/BufferDir/sha1 directory on the root server, and it should never be required on Relays at all (the Relay doesn’t store UploadManagerData persistently, the relay should remove it when it uploads to the parent.)

Jason et all,
I’ve done the connectivity test using Curl on the relay works just fine
[root@azl-besflover gendera.d]# curl -k https://bfixroot.pg.com:29450/cgi-bin/bfenterprise/clientregister.exe?RequestType=version

ClientRegister
Version 10.0.10.46

I’ve checked the folder again today and notices number of files have decreased and I can see this also when comparing with last week individual days
Last week 10/11 → 51345 today: 37395

                                    10/11                  10/15

ls -al | grep -i “oct 15” | wc -l → 6357
ls -al | grep -i “oct 14” | wc -l → 18043
ls -al | grep -i “oct 13” | wc -l → 4063
ls -al | grep -i “oct 12” | wc -l → 4298
ls -al | grep -i “oct 11” | wc -l → 4104 4311
ls -al | grep -i “oct 10” | wc -l → 9007 1
ls -al | grep -i “oct 9” | wc -l → 7767 1
ls -al | grep -i “oct 8” | wc -l → 4757 0
ls -al | grep -i “oct 7” | wc -l → 24908 3
ls -al | grep -i “oct 6” | wc -l → 1766 0
ls -al | grep -i “oct 5” | wc -l → 102 11

While working with HCL Support noticed also that debug logging was enabled on this relay which I’ve disabled also last week, so not sure if that was adding to the issue because relay was too busy. Will have another session with HCL Support later today to look further into this.
Keep you post on the finding thx for all advices and suggestions.

@fermt this is our fake root server (Linux relay) which does not have that file strucutre as you explained, I can see that structure on our Root server and we’re using it when collection logs/files from various clients.