Recently we acquired a couple endpoints which have had very odd behavior. They register and talk to their relays for a spell, and then go dark. The endpoints themselves are up and behavior normally, but the agent no longer communicates with the server. On inspection of the agent log, right around the time it went dark it it has winsock error -8… and nothing since then.
As it happens, both endpoints have AVG Free installed, of questionable vintage. Could it be that AVG’s network monitoring has clamped down prevents the agent from communicating?
That’s going to be my next trick. I was wondering, however, if AVG in general is known to be troublesome. We’ll likely move the endpoints to one of our usual packages, but I’m taking baby steps on these endpoints because reasons.
Once it goes dark, the client isn’t reporting at all. Unfortunately it’s on private IP space, and by design behind a highly restrictive NAT. I can’t get UDP to it even normally. The client has a manual configuration to talk to a specific relay, but after a while seems to default back to the core server, which isn’t allowed in the network egress rules. It seems to get ‘stuck’ in a state of trying to talk to the core server and hunting for relays, but can’t locate its intended relay.
For added fun, the client is XP-based WEPOS. 2.5GHz Celeron, 1GB RAM. Apparently AVG came from the vendor!
As a test I’ve removed AVG and substituted MS Endpoint Protection. I was on-site about an hour ago, and they were talking with their relay at that time. Now, however, the BF console indicates they are offline again.
On the client, can you run the Fixlet Debugger, and post the results of
Q: settings of client
That would help diagnose what’s going on. I’m still suspecting that it is switching to Automatic Relay Select Mode, but is unable to select any Relay because ICMP is blocked to all of them.
Without ICMP, the client will not even attempt to contact a relay over the TCP port.
It sounds like because the client can’t be reached by UDP, and can’t contact the Main BES Server, it needs to be configured to default to a specific Relay, and check with that Relay on a regular basis to find any pending actions.
Winsock -8 means the agent is able to resolve the relays IP address, but the socket ‘connect’ call failed. This can happen for any number of reasons.
What happens if you try to connect to the relay using a web browser on the device when the agent is stuck getting this error code? Can you troubleshoot it with normal networking utilities?
Thanks folks, y’all were on the right track. Turns out the issue was a mix of very tight firewall controls, both at the client network and the relay’s network. ICMP and reply were blocked.
As background, these clients have a manual configuration for one specific relay, and polling. They would connect as expected after installation, but after a while “disappear” from the server connections.
In addition, a contributing factor was an always-on action that enables automatic relay selection. I didn’t realize that relay selection relies on ICMP. So ,while these endpoints have manual configurations for precisely their own relay, enabling auto relay selection (however redundant), caused them to fail their relay and fall back to trying to connect to their configured root server. Thus, they “disappeared”.
Antivirus wasn’t related at all. This prompts some rethinking on how I’m doing relays. Hmmm…
Yes, ICMP is required for auto relay selection, and introduces its own quirks. The client uses the hop count from ICMP replies to select what it thinks is the closest relay; but it can’t tell that three hops across 10Gb Ethernet is better thab two hops across a 1Mb WAN link. That’s where Relay Affiliation groups can help.
In my deployments, when I install the client I first use a script to have ‘wget’ try to connect to a lost of my defined relays, and use that result to build a client settings config file with responding relays preconfigured as you can only specify two relays in the config file.
We have policy actions in place to apply Auto Relay Select on the IP Subnets where ICMP to relays is allowed, and manual relay settings on those subnets where it’s not.