Winsock error 8 vs AntiVirus

atlauren · November 26, 2014, 6:44am

Recently we acquired a couple endpoints which have had very odd behavior. They register and talk to their relays for a spell, and then go dark. The endpoints themselves are up and behavior normally, but the agent no longer communicates with the server. On inspection of the agent log, right around the time it went dark it it has winsock error -8… and nothing since then.

As it happens, both endpoints have AVG Free installed, of questionable vintage. Could it be that AVG’s network monitoring has clamped down prevents the agent from communicating?

jmaple · November 26, 2014, 1:00pm

Have you tried disabling the AV temporarily and restarting the client to get it to check in?

atlauren · November 26, 2014, 5:26pm

That’s going to be my next trick. I was wondering, however, if AVG in general is known to be troublesome. We’ll likely move the endpoints to one of our usual packages, but I’m taking baby steps on these endpoints because reasons.

AlanM · November 26, 2014, 5:44pm

I note that AVG has a firewall in some of their iterations as well.

Does the client ever report? (even once a day?) That is sometimes the symptom of a firewall blocking it.

There is an inspector that can tell you if anything is getting to the client UDP message wise

last command time of client

which records the time of the last UDP message. If that is more recent than the last registration time then you are getting UDP messages.

atlauren · November 27, 2014, 12:22am

Once it goes dark, the client isn’t reporting at all. Unfortunately it’s on private IP space, and by design behind a highly restrictive NAT. I can’t get UDP to it even normally. The client has a manual configuration to talk to a specific relay, but after a while seems to default back to the core server, which isn’t allowed in the network egress rules. It seems to get ‘stuck’ in a state of trying to talk to the core server and hunting for relays, but can’t locate its intended relay.

For added fun, the client is XP-based WEPOS. 2.5GHz Celeron, 1GB RAM. Apparently AVG came from the vendor!

As a test I’ve removed AVG and substituted MS Endpoint Protection. I was on-site about an hour ago, and they were talking with their relay at that time. Now, however, the BF console indicates they are offline again.

JasonWalker · November 27, 2014, 4:27pm

On the client, can you run the Fixlet Debugger, and post the results of
Q: settings of client

That would help diagnose what’s going on. I’m still suspecting that it is switching to Automatic Relay Select Mode, but is unable to select any Relay because ICMP is blocked to all of them.
Without ICMP, the client will not even attempt to contact a relay over the TCP port.

TimRice · November 30, 2014, 2:32pm

Sounds like this system needs to use Command Polling and a Failover Relay.

_BESClient_Comm_CommandPollEnable
_BESClient_Comm_CommandPollIntervalSeconds
_BESClient_RelaySelect_FailoverRelay

Client settings documentation

It sounds like because the client can’t be reached by UDP, and can’t contact the Main BES Server, it needs to be configured to default to a specific Relay, and check with that Relay on a regular basis to find any pending actions.

AgentGuy · December 1, 2014, 4:43pm

Winsock -8 means the agent is able to resolve the relays IP address, but the socket ‘connect’ call failed. This can happen for any number of reasons.

What happens if you try to connect to the relay using a web browser on the device when the agent is stuck getting this error code? Can you troubleshoot it with normal networking utilities?

atlauren · December 2, 2014, 2:32am

Thanks folks, y’all were on the right track. Turns out the issue was a mix of very tight firewall controls, both at the client network and the relay’s network. ICMP and reply were blocked.

As background, these clients have a manual configuration for one specific relay, and polling. They would connect as expected after installation, but after a while “disappear” from the server connections.

In addition, a contributing factor was an always-on action that enables automatic relay selection. I didn’t realize that relay selection relies on ICMP. So ,while these endpoints have manual configurations for precisely their own relay, enabling auto relay selection (however redundant), caused them to fail their relay and fall back to trying to connect to their configured root server. Thus, they “disappeared”.

Antivirus wasn’t related at all. This prompts some rethinking on how I’m doing relays. Hmmm…

atlauren · December 2, 2014, 4:56am

FYI, I posted to this page and recommended that the language be refined to specify the need for ICMP in auto relay selection.
https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Tivoli%20Endpoint%20Manager/page/TEM%20Relays

JasonWalker · December 2, 2014, 5:08am

Yes, ICMP is required for auto relay selection, and introduces its own quirks. The client uses the hop count from ICMP replies to select what it thinks is the closest relay; but it can’t tell that three hops across 10Gb Ethernet is better thab two hops across a 1Mb WAN link. That’s where Relay Affiliation groups can help.

In my deployments, when I install the client I first use a script to have ‘wget’ try to connect to a lost of my defined relays, and use that result to build a client settings config file with responding relays preconfigured as you can only specify two relays in the config file.

We have policy actions in place to apply Auto Relay Select on the IP Subnets where ICMP to relays is allowed, and manual relay settings on those subnets where it’s not.

atlauren · December 2, 2014, 5:20am

Hmmm. My standard clients already use relay affiliation groups, and in our network architecture hop count is nearly irrelevant.

I wonder if there’s any point to auto relay selection at all, in our case?

JasonWalker · December 2, 2014, 12:34pm

Affiliation only works in conjunction with Auto Relay select