Relay & Agent Resilience

We’ve recently encountered a situation where we lost connection to almost half of our agents due to a mis-configured analysis being ran.

I started by restarting the BES Root Server services (all of them) in the hope that the agents / relays reporting there would re-connect but that didn’t work so I moved onto the relays that we had lost (around 65) at that point and manually restarted them but that didn’t resolve the overall problem either as the agents still weren’t talking to the relays despite all of them having their services running.

I planned to open an RFE (or idea as they are now called) but I’m kind of lost as to what I should even be asking for so wanted to reach out to the forum and gauge opinions and stories of similar issues and also find out what you done to resolve them.

So, some questions:

  • Do the agents / relays have some reliance features that I’m missing?
  • Does BigFix 10 agentless fix this?
  • Is there any way (that I’ve not considered) to get agents reporting again when they end up in this state using the relay (when we get them back).

Hi John. Do you have a support case with L2 regarding this matter? If not, does an internal post-mortem exist from this incident that you could share? You can DM me here and I’ll provide you with my HCL email address in order to assist further.

Hey Casey,

It’s all under way (done it after I wrote this lol) but wanted to check with the community if anyone had experienced it before.

The more eyes on it the better so will drop you a message too :slight_smile: thanks mate

2 Likes

What is the idea link, we also promote! We have also faced this kind of problem but BigFix dont have any working solution to get the client back if due to any issues client lost communication.

There has to be some auto heal functionality but unluckily no one ever worked on that.

Recently we have to ask our os support team to restart besclient services on impacted machines, 40% was reachable by other monitoring or automation tools which help but for others owners have to login and restart them manually and still working on it. It was kind of horrible situation if BigFix lost control on huge number of machines due to any BigFix infra changes.

https://bigfix-ideas.hcltechsw.com/ideas/BFP-I-161

3 Likes