In a few cases (but not too few) we’ve seen BigFix seemingly stop running on macOS devices. The devices are online and checking in to other management platforms, but the BESAgent appears dead in its tracks. Looking at the logs, they always appear fine, right up until they abruptly stop.
Here’s a device that stopped checking in about 2 months ago:
At 20:42:30 -0500 -
Successful Synchronization with site 'actionsite' (version 866023) - 'http://bigfix.company.com:52311/cgi-bin/bfgather.exe/actionsite'
At 20:42:35 -0500 -
Successful Synchronization with site 'mailboxsite' (version 2) - 'http://bigfix.company.com:52311/cgi-bin/bfgather.exe/mailboxsite1079874398'
That’s the last thing that reported to BigFix, despite the device being powered and online. The log folder has consistent logs up until 20250304.log, which is the last log file created (today being 2025-04-29).
The client certificate doesn’t expire for 12 months.
Typically we can kick the agent with sudo launchctl stop com.bigfix.BESAgent and it will start reporting again.
macOS devices are usually running somewhere around the latest release (recent issues seen on 15.3.1 and 15.4 devices) and we’re currently running BigFix 11.0.3. However, there’s a post from 2 years ago that describes our issue pretty much exactly: BESAgent on macOS just stopping? so I think this has been happening for a while across several releases.
Has anyone else seen this recently/have a fix or something to look into specifically? Thank you
I’ve started using a watchdog script in another platform to try and resuscitate BigFix if it’s been too long since a log file has been created (so far the only indicator I’ve been able to find that something is wrong)
#!/bin/zsh
if [[ ! -f "/Library/BESAgent/BESAgent.app/Contents/MacOS/BESAgent" ]];
then
echo "BigFix does not exist on target machine"
exit 40
fi
sleep 8
lastLog=$(basename $( ls /Library/Application\ Support/BigFix/BES\ Agent/__BESData/__Global/Logs/ | sort -Vr | head -n1) .log)
currDate=$(LANG=en_US date '+%Y%m%d' 2>&1)
currEpoch=$(date -jf "%Y%m%d" $currDate +%s)
lastEpoch=$(date -jf "%Y%m%d:%H%M%S" "$lastLog:235959" +%s)
lastAction="None"
if (( $(($currEpoch - $lastEpoch)) > 8*60*60 ));
then
lastAction="Restart"
launchctl stop com.bigfix.BESAgent
sleep 10
if launchctl list com.bigfix.BESAgent | grep -vqz "PID";
then
launchctl start com.bigfix.BESAgent
fi
fi
sleep 10
echo "LastRun:$currDate LastAction:$lastAction LastLog:$(basename $( ls /Library/Application\ Support/BigFix/BES\ Agent/__BESData/__Global/Logs/ | sort -Vr | head -n1) .log) LastLine:$(tail -1 /Library/Application\ Support/BigFix/BES\ Agent/__BESData/__Global/Logs/$(LANG=en_US date '+%Y%m%d').log | grep -v "^.$" | tail -n 1 | tr -d ' ')"
However, I’d prefer to avoid such a hacky solution
We have also encountered issues with Mac clients not reporting, though not exactly the same as the one you’re experiencing. Initially, we tried several manual fixes, but we couldn’t pinpoint which client was facing which specific issue. This lack of visibility led to a negative perception of BigFix. To address this, we developed a Self Heal script for our Mac devices.
The Self Heal script performs the following steps:
Checks the keystorage folder to ensure the client has all the necessary files, confirming that the keystorage folder is healthy and the client registration was successful (based on discussions with the HCL team).
Verifies the BESClient service is running. If the service is found running and the BESClient log shows “Report Posted Successfully,” the script exits.
Starts the BESClient service if it is not running.
Re-checks the BESClient log for “Report Posted Successfully.”
If the checks don’t pass, the script uninstalls and re-installs the client.
If, after reinstallation, BESClient still isn’t functioning properly, the script updates the JAMF logs to notify the engineering team.
This solution has been a lifesaver for us. Since implementing it, we haven’t encountered any further issues, and everything is running smoothly.
Initially, when we first implemented the Self Heal script, we discovered many duplicate devices. However, once all the issues were addressed and fixed, we no longer encountered such duplication. The script has proven to be highly effective, and since then, everything has been running smoothly without any further issues.
To conclude, from the Mac team’s perspective, they created a JAMF policy to deploy a daemon that executes the Self Heal script daily at an agreed interval. This automated approach ensures continuous monitoring and self-healing of the Mac clients, further streamlining the process and maintaining device health without manual intervention.
Could you expand a bit on the criteria you’re looking for the in keystorage folder? Is it just the existence of the cert files?
How are you handling the reinstall/secure deployment of afxm files, etc.? I wouldn’t want to just throw them up in a public S3 bucket, but we have machines that are offsite, so I can’t just use an internal file server or limit access to our network. Are you re-using the afxm already on the device, or have you seen issues with the afxm missing or being corrupted as well?
Do you have any metrics on how often the script needs to take an action, whether it be restarting the BESAgent service or fully reinstalling?
For some more background info, I just recently noticed this happening on my own device. Logs have not been created since 1400 yesterday, and it’s about 1100 today, with the device being online and in-use for several hours between.
BESAgent is running, but no log file is being created, and the device is not checking in or seemingly even attempting to.
I was moving around a bit between buildings yesterday — maybe closing my laptop at the exact wrong time could have caused this? Without logs, I’m unsure how to diagnose.
I recommend enabling debug client logging to gain deeper visibility into what’s happening behind the scenes. This will help identify any hidden issues that might not surface in standard logs.
Screenshot taken today, 2025-05-12, and the debug log was last written to on 2025-05-08, around the same time the last lines were written to the regular logs: