macOS BESAgent appears to stop checking in

eg2428 · April 29, 2025, 7:12pm

In a few cases (but not too few) we’ve seen BigFix seemingly stop running on macOS devices. The devices are online and checking in to other management platforms, but the BESAgent appears dead in its tracks. Looking at the logs, they always appear fine, right up until they abruptly stop.

Here’s a device that stopped checking in about 2 months ago:

At 20:42:30 -0500 - 
   Successful Synchronization with site 'actionsite' (version 866023) - 'http://bigfix.company.com:52311/cgi-bin/bfgather.exe/actionsite'
At 20:42:35 -0500 - 
   Successful Synchronization with site 'mailboxsite' (version 2) - 'http://bigfix.company.com:52311/cgi-bin/bfgather.exe/mailboxsite1079874398'

That’s the last thing that reported to BigFix, despite the device being powered and online. The log folder has consistent logs up until 20250304.log, which is the last log file created (today being 2025-04-29).

The client certificate doesn’t expire for 12 months.

Typically we can kick the agent with sudo launchctl stop com.bigfix.BESAgent and it will start reporting again.

macOS devices are usually running somewhere around the latest release (recent issues seen on 15.3.1 and 15.4 devices) and we’re currently running BigFix 11.0.3. However, there’s a post from 2 years ago that describes our issue pretty much exactly: BESAgent on macOS just stopping? so I think this has been happening for a while across several releases.

Has anyone else seen this recently/have a fix or something to look into specifically? Thank you

eg2428 · April 29, 2025, 7:14pm

I’ve started using a watchdog script in another platform to try and resuscitate BigFix if it’s been too long since a log file has been created (so far the only indicator I’ve been able to find that something is wrong)

#!/bin/zsh

if [[ ! -f "/Library/BESAgent/BESAgent.app/Contents/MacOS/BESAgent" ]];
then
        echo "BigFix does not exist on target machine"
        exit 40
fi

sleep 8
lastLog=$(basename $( ls /Library/Application\ Support/BigFix/BES\ Agent/__BESData/__Global/Logs/ | sort -Vr | head -n1) .log)
currDate=$(LANG=en_US date '+%Y%m%d' 2>&1)

currEpoch=$(date -jf "%Y%m%d" $currDate +%s)
lastEpoch=$(date -jf "%Y%m%d:%H%M%S" "$lastLog:235959" +%s)

lastAction="None"
if (( $(($currEpoch - $lastEpoch)) > 8*60*60 ));
then
        lastAction="Restart"
        launchctl stop com.bigfix.BESAgent
        sleep 10
        if launchctl list com.bigfix.BESAgent | grep -vqz "PID";
        then
                launchctl start com.bigfix.BESAgent
        fi
fi

sleep 10

echo "LastRun:$currDate LastAction:$lastAction LastLog:$(basename $( ls /Library/Application\ Support/BigFix/BES\ Agent/__BESData/__Global/Logs/ | sort -Vr | head -n1) .log) LastLine:$(tail -1 /Library/Application\ Support/BigFix/BES\ Agent/__BESData/__Global/Logs/$(LANG=en_US date '+%Y%m%d').log | grep -v "^.$" | tail -n 1 | tr -d ' ')"

However, I’d prefer to avoid such a hacky solution

vk.khurava · April 30, 2025, 3:13am

We have also encountered issues with Mac clients not reporting, though not exactly the same as the one you’re experiencing. Initially, we tried several manual fixes, but we couldn’t pinpoint which client was facing which specific issue. This lack of visibility led to a negative perception of BigFix. To address this, we developed a Self Heal script for our Mac devices.

The Self Heal script performs the following steps:

Checks the keystorage folder to ensure the client has all the necessary files, confirming that the keystorage folder is healthy and the client registration was successful (based on discussions with the HCL team).
Verifies the BESClient service is running. If the service is found running and the BESClient log shows “Report Posted Successfully,” the script exits.
Starts the BESClient service if it is not running.
Re-checks the BESClient log for “Report Posted Successfully.”
If the checks don’t pass, the script uninstalls and re-installs the client.
If, after reinstallation, BESClient still isn’t functioning properly, the script updates the JAMF logs to notify the engineering team.

This solution has been a lifesaver for us. Since implementing it, we haven’t encountered any further issues, and everything is running smoothly.

Initially, when we first implemented the Self Heal script, we discovered many duplicate devices. However, once all the issues were addressed and fixed, we no longer encountered such duplication. The script has proven to be highly effective, and since then, everything has been running smoothly without any further issues.

To conclude, from the Mac team’s perspective, they created a JAMF policy to deploy a daemon that executes the Self Heal script daily at an agreed interval. This automated approach ensures continuous monitoring and self-healing of the Mac clients, further streamlining the process and maintaining device health without manual intervention.

eg2428 · April 30, 2025, 2:54pm

This sounds interesting, thank you.

Could you expand a bit on the criteria you’re looking for the in keystorage folder? Is it just the existence of the cert files?
How are you handling the reinstall/secure deployment of afxm files, etc.? I wouldn’t want to just throw them up in a public S3 bucket, but we have machines that are offsite, so I can’t just use an internal file server or limit access to our network. Are you re-using the afxm already on the device, or have you seen issues with the afxm missing or being corrupted as well?
Do you have any metrics on how often the script needs to take an action, whether it be restarting the BESAgent service or fully reinstalling?

Thanks again for sharing.

eg2428 · April 30, 2025, 2:58pm

For some more background info, I just recently noticed this happening on my own device. Logs have not been created since 1400 yesterday, and it’s about 1100 today, with the device being online and in-use for several hours between.

BESAgent is running, but no log file is being created, and the device is not checking in or seemingly even attempting to.

I was moving around a bit between buildings yesterday — maybe closing my laptop at the exact wrong time could have caused this? Without logs, I’m unsure how to diagnose.

vk.khurava · April 30, 2025, 3:34pm

In short, the key indicators of a healthy client are:

Presence of 5 specific files in the Keystorage folder
Log entries confirming:
- “Relay Selected Successfully”
- “Report Posted Successfully”

Everything being handled by JAMF.

Currently, the script is scheduled to run every 4 hours, with reinstallation being treated as a last-resort step. Here’s the logic it follows:

Check for the BESAgent application and presence of keystorage files
If the BESAgent service is running, verify the logs to confirm if a report has been posted, then exit
If the service is not running, restart it and check the status again at the next scheduled interval, then exit
If the service is running but no report is found, restart the service and exit
If the application or keystorage files are missing, JAMF will trigger a reinstall of the BESClient

vk.khurava · April 30, 2025, 3:37pm

I recommend enabling debug client logging to gain deeper visibility into what’s happening behind the scenes. This will help identify any hidden issues that might not surface in standard logs.

eg2428 · April 30, 2025, 3:50pm

Thanks, I’ll try increasing the logging level.

I think it might be a memory leak, based on logs in the console.

eg2428 · May 12, 2025, 4:43pm

To follow up on this, I enabled Client Debug Logging, but the besclientdebug.log file also stops.

Screenshot taken today, 2025-05-12, and the debug log was last written to on 2025-05-08, around the same time the last lines were written to the regular logs:

I’ve submitted a ticket.