I thought I would throw this out there while I am compiling data for a PMR.
We have been having intermittent issues with heartbeat and action response from a large number of random servers. Upgraded to 9.5.11 on Jan 30. Since then I have had SQL drop about 8 times and have had to bounce FillDB service the same. Parallelism is enabled. Enhanced Security is enabled.
Came in yesterday (Monday) and about 1/3 (5k) of all endpoints were grey in console with a last check in time of 3/2(Friday). Did the usual, clear cache, refresh, etc, no change. The weird thing though, checking BESclient logs on multiple endpoints showed no issue. Endpoints were reporting, good synchs, responding to actions, but not reporting back. Case in point was my own pc, which was on with no issues all weekend.
This is what I saw at 10am 3/4.
Client log history was missing 3/2-3/2
This is the response after a Force Refresh:
At 10:32:57 -0800 -
ForceRefresh command received. Version difference, gathering action site.
At 10:33:35 -0800 -
Successful Synchronization with site ‘actionsite’ (version 1070603) - ***
At 10:33:46 -0800 -
Gathering all operator/mailbox sites.
Ran a BESClient Diagnostics and no error came back. Bounced FillDB to mod the log file size, and my pc finally checked in at 2pm. When it checked in all of my test fixlets that I ran the last few hours all went from Not Reported to Completed. This was not an issue prior to the upgrade. Also I was chatting with Mark Leaphart the whole time and he was stumped.
Anyone have anything I should look at?
Edit - Just checked and this ep has not checked in in 2.5 hours today.