Linux BES Client stops reporting, goes offline, still receives and executes actions

Good afternoon all,

I have a problem whereby some of our Linux BigFix clients running Ubuntu 16.04 just stops reporting. I’ve recently been handed our reletively new BigFix infrastructure and have spent a lot of time in the last two learning it and making documentation for everything, so if I’ve missed any troubleshooting steps please let me know and I’ll perform testing.

I’ve verified and done the following:

  • The besclient service is running, but the last time “Report posted successfully” appears is after besclient is started. In my logs, besclient did sync with three of it’s sites, but didn’t post a report at the same time.
  • The logs indicate the last report time being a similar time to what the Console says (with no errors)
  • Restarting the besclient service on the client makes the client appear live again in Console for 45 minutes when the “Mark as offline after” setting kicks in, marking it offline. No errors get logged in the client logs.
  • Our Windows clients/servers don’t have this issue, and not all of our Linux servers exhibit this behavior.
  • Whilst online in Console, I can raise actions, and the client will receive them.
  • Whilst appearing offline in Console, I can raise actions, and the ‘offline’ client does receive them and process them, shortly before returning “Report posted successfully”. The client then comes back online in the Console. 45 minutes later, the system goes offline in Console.

In BigFix Console, client heartbeat interval is set to 15 minutes. It definitely doesn’t report every 15 minutes.

To me as someone who just started learning BigFix it looks like it isn’t reporting automatically, as a result of a misconfiguration by my predecessor, or something specific to these Linux hosts (Ubuntu 16.04 and 14.04 both affected. All our CentOS hosts and a Ubuntu 18.04 host are reporting normally). I quickly logged on to one of the good Linux hosts running CentOS to check, and I can confirm that every 15 minutes, the BESclient is reporting successfully every 15 minutes. Some of the other 14.04 and 16.04 hosts stop reporting after it reports after initial startup.

Any ideas or anything I can try? I feel like I’m banging my head up against a brick wall. Many thanks in advance.

In the BigFix Console you can turn on debug logging to get a better perspective.
If you don’t see anything obvious, I would lodge a PMR.
They are going to ask for logs… and probably for you to run some diagnostics from the BES Support site.
On a hunch based on something I have seen just yesterday, I would look through your custom analysis on your BigFix server that this machine is applicable to. I recently saw a piece of custom relevance that measured sha2s of descendants of “/”… That is very expensive relevance and the machine was computing sha2s of all file objects under “/”… Anyway, give it a try.
-jgo

2 Likes

@lsward, could you post the contents of one of the daily BESClient log located at /var/opt/BESClient/__BESData/__Global/Logs?

Apologies for the non-response, I have a PMR open for another issue at the moment in that my console now shows virtually nothing. If I turn on “Show non-relevant content” it shows the BES Support site, and that’s it, but IBM Web Reports shows all my computers and sites as normal. This is a different topic though so I won’t go in to detail on this thread.

I’ll respond with client logs as soon as this problem is fixed as it’s possible client behavior might be affected by this issue and the logs might not reflect this issue by itself.

1 Like

OK, that problem is now solved. It unfortunately wasn’t related to this issue.

I’ve narrowed the issue down to just Ubuntu 16.04 and 14.04 hosts. I enabled debug logging with a debug level of 10000.

After manual restarting the besclient to see if there was anything on startup of the client that might have been of use. I don’t know if this points anything out in particular:

I also started noticing the following during background evaluation…

Aside from this, everything seems normal. Trying to be careful I don’t reveal any of the hostnames or domains of our infrastructure.

The normal log displays the following whilst the above is happening…

Current Date: June 7, 2018
   Client version 9.5.9.62 built for Ubuntu 10 amd64 running on sysname:Linux release:4.4.0-38-generic arch:x86_64
   Current Balance Settings: Use CPU: True Entitlement: 0 WorkIdle: 10 SleepIdle: 480
   Locale: LC_ALL="" LC_CTYPE="" LC_MESSAGES="" LANG="en_GB.UTF-8"
   ICU 54.1 init status: SUCCESS
   Agent internal character set: UTF-8
   ICU report character set: UTF-8 - Transcoding Disabled
   ICU fxf character set: windows-1252 (Latin 1 / Western European) - Transcoding Enabled
   ICU local character set: UTF-8 - Transcoding Disabled
   EMSG Logging Detail Level set to: 10000
At 13:48:28 +0100 -
   Starting client version 9.5.9.62
   FIPS mode disabled by default.
   Cryptographic module initialized successfully.
   Using crypto library libBEScrypto - OpenSSL 1.0.2j-fips  26 Sep 2016
   Initializing Site: actionsite
   Restricted mode
   Initializing Site: BES Support
   Initializing Site: CustomSite_Test
   Initializing Site: Patches for Ubuntu 1604
   Initializing Site: Patching Support
   Initializing Site: mailboxsite
   Processing Download plugins
   Beginning Relay Select
At 13:48:29 +0100 -
   RegisterOnce: Attempting secure registration with '<one of our relays>'
   Unrestricted mode
   Configuring listener without wake-on-lan
   Registered with url '<one of our relays>'
   Registration Server version 9.5.6.63 , Relay version 9.5.6.63
   Relay does not require authentication.
   Client has an AuthenticationCertificate
   Relay selected: <one of our relays>. at: <removed for security>:52311 on: IPV4 (Using setting IPV4ThenIPV6)
At 13:48:31 +0100 -
   PollForCommands: Requesting commands
   PollForCommands: commands to process: 1
   Entering Service Loop.
   Starting Service Loop.
   A2AServer::Start().
   Successful Synchronization with site 'actionsite' (version 581) - 'http://<removed for security>:52311/cgi-bin/bfgather.exe/actionsite'
At 13:48:32 +0100 -
   Successful Synchronization with site 'mailboxsite' (version 20) - 'http://<removed for security>:52311/cgi-bin/bfgather.exe/mailboxsite12853353'
   [ThreadTime:13:48:31] SetupListener success: IPV4/6
   Encryption: optional encryption with no certificate; reports in cleartext
   Report posted successfully
At 13:48:38 +0100 - BES Support (http://sync.bigfix.com/cgi-bin/bfgather/bessupport)
   Relevant - BES Client Setting: Disable Debug Logging (fixlet:196)
At 13:49:58 +0100 -
   Report posted successfully
At 13:58:32 +0100 -
   Successful Synchronization with site 'CustomSite_Test' (version 466) - 'http://<removed for security>:52311/cgi-bin/bfgather.exe/CustomSite_Test'

No other errors are logged, and no other reports were posted. It does spend a long time evaluating Ubuntu 16.04 patches though, with that “Sync error” message every second or so as it’s doing it.

Hope this helps, if not, no worries, I’ll raise a PMR with IBM.

1 Like

Raise the PMR.

The A2A error I is normally informational within debug, but that pipe error is troubling.

What is the last fixlet to evaluate from BES Support before the agent becomes unresponsive?

-jgo

1 Like

I’m wondering if there was ever a solution found for this. I have the same problem with Ubuntu 16.04 and 18.04 clients and I’m not getting much from support.

I suggest you should probably open a new topic, and include some snippets from your clients’ logs. We would probably need a lot of the same info that Support should be asking, such as messages from your client logs for relay selection, and how frequently they are reporting, along with whether host-based or network firewalls are blocking any of the traffic.

There are several different types of “not reporting” clients so the diagnostic steps may vary.

I’ve just put together a post with general information for several common diagnosis items that I think may be helpful at Troubleshooting Client Reponsiveness