"invalid action site epoch"

I have endpoints that won’t check in and in the FilDB logs there is an error for those:

Discarding message from computer 15160336 because it has the invalid action site epoch ‘12 Jul 2018 18:53:44’ (ought to be ‘21 Jan 2020 14:30:34’).

Looking at the client logs it looks like the client can get to the server but it is not communicate with it.

The fixes in the forum that I have found point to a KB article on the IBM site that has been taken down.

Anyone know how to resolve this?

Thank you!

Do you have multiple Bigfix deployments, separate licenses, or something along those lines? Reading this message it seems like the actionsite.afxm on the client is tied to the wrong deployment.

You should consider opening a support case on this, but if you are open to some testing I’d suggest stopping the BES Client service, replace the actionsite.afxm in the client folder with a known-good copy from your root server, and restart the client.

1 Like

Single deployment, same license. Recently rebuilt the server that is where the differences in dates comes from. The endpoints that aren’t checking in are the same as the ones that wouldn’t check in before the server was rebuilt. I have asked the tech support person doing the work to try what you mentioned as the logs suggest that the client isn’t able to update the actionsite. I wondering if there might be a domain policy issue as the groups that aren’t checking in are in the same Group Policy OU. As a side note one the the endpoints that is having the issue checking in is the BES sever itself.

Thanks for the help!

Is this happening with all of your clients, or only a few of them?

1 Like

Ultimately, what you need to have happen is for the Clients in question to properly synchronize with the actionsite version that is being hosted by the Root Server. This will update the actionsite’s ‘epoch’ on the Client, and will allow it to match that of the Server.

So, in the Client log, where it is indicating that it is unable to update the actionsite, are there more details you can share?

If the BES Server itself is exhibiting this issue, you’ll want to ensure that the actionsite is being properly hosted by the BES/Root Server (sounds like it might not be).

As Jason suggests, this really warrants a support case :slight_smile:

1 Like

Thanks everyone, It is only happening with a few. We are still trying to hunt down an issue. I’m leaning towards a group policy issue. I don’t think it is a BigFix issue. I just wish I know what specific things could cause this, such as windows firewall, or file permissions issue.

This has been slow going as the work is being done remotely and patching January criticals are the priority so the ones that are not checking in they will patch manually. Aprox 14 machines.

Oh and my tech support guy did do some troubleshooting with someone at HCL and so far they have had him try things that I’ve already suggest. Not sure if they opened an actual ticket.

If it’s just 14 machines you could delete the __BESData folder and force the clients to reset themselves. Certainly a sledgehammer approach, but it should work. Be aware that by doing this you will end up with dups of those machines in the console. You can remove the old ones either by hand or by running the duplicate computer removal process via the BES Admin tool. https://help.hcltechsw.com/bigfix/9.5/platform/Platform/Installation/c_clean_up_computer.html.

Worth noting that marking a computer as deleted just means that it will no longer show up in the console but the data will remain in the database in case you need to access it in the future. Full deletion would only occur if you run the Remove Deleted Computers option.

3 Likes

Thank you dmccalla for the tip but all of the usual things were tried. A odd thing happened over the weekend on this. So the site had to work over the weekend to get the January critical installed. I gave them instructions on how to manually patch the systems that were not checking in. I just got word that after the patching and associated reboots all the servers that were unable to check in previously are now checking in. I’m still leaning to a group policy issue because all those servers were in the same GP OU. This leaves their site of workstations that can’t check in. That brings back up the possibility of a network issue.

Makes sense. Hard to say without a closer look, but some things that come to mind include

  • A GPO meant to install client on new computers, repeatedly reinstalling the client.
  • A GPO intended to apply client settings through registry settings, overwriting the newer/correct settings applied through a BES Action.
  • Another management product altogether, clobbering client settings or BESData folders (once had a client incorrectly clobbering the client data via Puppet)
  • A third-party security product blocking site gathers, that may need to whitelist the BESData folders. I recently encountered Carbon Black blocking a Console from building its cache.
  • A problem on the Relay that clients in this OU happen to be using; possibly cleared when the Relay was rebooted for patching.

The fact they were all in the same OU points toward GPO, but it could be they also shared a common third-party tool or configuration, like 'This OU also gets a special Antivirus policy so your global whitlist was not in effect"

2 Likes

Good point Jason. We have had issues with McAfee exception policy in the past. Well all sorts of problems with McAfee but that is another discussion. Usually one of our first troubleshooting steps in these scenarios is to disable McAfee. Can’t remember if my guy doing the work tried this but he is sharp when it comes to these these things.