Tip: Troubleshooting Client Reponsiveness

I hope to make a thread here or Wiki post to cover some steps to diagnose one of the most common reported issues, “My clients are not reporting”. This can be complex to diagnose, because there can be several causes related to each customer’s specific configuration. Nothing here is meant to supplant advice from BigFix Support or BigFix Services, but my hope is this can be a handy reference for first steps at diagnosing common issues.

The settings discussed in this post can be found at https://help.hcltechsw.com/bigfix/9.5/platform/Platform/Config/r_client_set.html , which has links to additional concepts such as automatic/manual relay selection, command polling, persistent connections, etc.

When a client appears to not be reporting (i.e. the "Last Report Time" value is not updating in the Console)

The first place to check is in the client logs. C:\Program Files (x86)\BigFix Enterprise\BES Client\__BESData\__Global\Logs on Windows, or /var/opt/BESClient/__BESData/__Global/Logs on Linux/UNIX. Check for “Relay select” messages to see whether Relay Selection is working and whether there are “Report posted successfully” messages.

If the client is not successfully connecting to a Relay via Automatic Relay Select:

Automatic Relay Selection requires that the client first Ping a relay, and then connect to responding relays via TCP/52311 (by default). Check that the client can resolve relay names via ‘nslookup’ or ‘dig’. Check that the client can ping at least one of the Relays. Check that the client can connect to the same relay via a browser - try to download a copy of the actionsite file via http://yourserver.yourdomain.com:52311/masthead/masthead.afxm . I like to use this URL because it does not require Relay Diagnostics to be enabled, just connectivity to the relay. A good connection will show a copy of your deployment’s actionsite.afxm file. A prompt for a certificate or an “HTTP 403: Forbidden” message is also good - it indicates you are making it to the Relay, but the relay has Relay Authentication turned on and will only allow a client with a BigFix-generated certificate to connect.

For new clients, that have never joined the deployment; or those that roam where Relays cannot be pinged (Internet / DMZ clients for example), ensure that the _BESClient_RelaySelect_FailoverRelay or _BESClient_RelaySelect_FailoverRelayList client settings have values. When an Automatic Relay Select client cannot ping any Relays, it will failback using these settings and will try to connect to these relay(s) even if they cannot be pinged. Newly-installed clients, that have not yet registered and discovered a list of Relays for your deployment, can also use these settings during their initial registration.

If the client is posting reports, the "Last Updated Time" in the console is updating, but the client does not process actions or report on new Analyses for a long time:

The client is likely not being informed about new content. By default, the client should receive a notification from its parent relay via udp/52311 whenever there is new content to evaluate - a new Action, new Analysis, new Fixlet, etc.

Check the client logs for “GatherHashMV” messages. If these are not appearing in the log, the client is not receiving the notifications. Check for host-based Firewall that may block the traffic on the endpoint. There are fixlets in the BES Support site for Windows, Red Hat, and possibly other distributions to open host-based firewall rules for BESClient, but even if these are not relevant it’s worth double-checking; as new OS distributions change how the client firewall behaves, these fixlets may require updating.

If network firewalls, NAT translations, etc. block the UDP messages from the Relays, then consider enabling Persistent Connections or Command Polling from the clients. Command Polling configures the client to periodically check for new site content on their Relays; generally values of one to three hours are ideal polling intervals. Persistent Connections, configured on both the client and the relay, configure the client to keep their TCP connection to the relay open so the relay can send these “new content available” messages to the client over that persistent channel, at a cost of slightly higher network load on the Relay.

Also consider that in a chained relay structure, before a relay can notify its clients of new content, the relay itself must be aware of it. The Root server and Parent Relays send notifications to child relays over tcp/52311 - so tcp/52311 should be enabled in both directions between Root Servers and Parent/Child Relays.

One more note here, that with Command Polling enabled, a client may log GatherHashMV messages in the log immediately after a Command Poll (which is also logged); when the BESClient Service restarts; or when a new Relay Selection/Client Registration occurs (every 6 hours, by default); in this case the GatherHashMV message does not indicate the client is receiving the UDP notification, but rather that it discovered new content from the Command Poll/site gather.

If the client is posting reports, but much less frequently than expected; is receiving GatherHashMV messages, but is slow to execute or report back Analysis results, there may be long-running evaluations

This is one of the more difficult conditions to diagnose, and usually requires enabling debug logging on the client, and/or custom Analysis properties to identify long-running property evaluations.

There are very useful Analyses at bigfix.me, including https://bigfix.me/analysis/details/2994765 and https://bigfix.me/analysis/details/2998690 , for tracking Properties that have long-running evaluations. Those can slow down the overall client evaluation cycle, so considering optimizing them or reducing their Evaluation Period defined on the property.

Additionally it may be advisable to increase _BESClient_Report_MinimumAnalysisInterval as described at
What is Report MinimumAnalysisInterval? (Client Setting) . This can be especially useful if many Properties are configured to evaluate “On Every Report”, as these can delay sending new reports (especially while Actions are processing and sending updated status reports) when these long-running properties have to be re-evaluated before sending the new reports.

If a new Action reports "Evaluating" for a long time without progressing

The BigFix client can only execute one action at a time. Check whether another action is still running (and possibly “stuck”) on the client. From the Console, open the computer, check the Action History tab, and expand Actions -> By Property -> By Action Status. Look for any in the “Running” state, as these can block the new action from executing.

Common causes for an action to get “stuck” are using an ActionScript command like ‘wait’ or ‘waithidden’, and running an executable that attempts to take an input from the user. As the client executable cannot display a window to the user, it may never progress.

To clear an open action that is “stuck” in this state, either Stop the action from the console, or restart the BigFix Client. Note this allows the BESClient itself to continue, but does not stop the running executable the BESClient launched. This can leave the __Download folder locked if the spawned process is executing from the folder, which can prevent downloads from the next Bigfix action from being processed. It is often necessary to kill the spawned processes, using a ‘kill’ command on Linux/UNIX or a ‘taskkill.exe’ command on Windows.

From BigFix 9.5.11, preventative measures can be put in place to prevent and recover from this condition. See the client settings for _BESClient_ActionManager_OverrideTimeoutSeconds and _BESClient_ActionManager_OverrideDisposition

If the client is processing actions and content according to its logs, and is sending reports indicated by 'Report Posted Successfully' messages, but the Console status is not updating

This could be a much less common case where a Relay or Root Server is failing to process the uploaded reports. Check each Relay in the chain from the client up to the root for stuck reports.

Check the BES Relay\FillDBData\BufferDir folder over a period of several minutes. We expect to see many very tiny files get created, and almost immediately removed, from the folder. These are client reports that are received, and then sent up to the Parent Relay or Root Server.

If files are created in this directory but remain for several minutes, or do not get removed at all, the Relay may not be able to report in to its Parent Relay. Check the BES Relay\logfile.txt or the BES Client Log on the child and parent relays; and the Root Server’s BES Server\BESRelay.log and BES Server\FillDBData\FillDB.log files for errors.

16 Likes

Hi Jason, you might want to add DNS Servers to the list of reasons clients don’t respond to UDP messages…

1 Like

Good point! Will update when I’m back at work and have my BES Server booted up. There are tasks in BES Support for that condition I’ll want to reference.

Our Tasks look for the DNS service specifically, but there are other services (Exchange, probably others) that can consume a huge number of UDP ports at startup and could also interfere.

Really the UDP port conflict could come from any server application, it’s just more likely with these services that consume thousands of ports.

6 posts were split to a new topic: Troubleshoot infrequent client reporting