SOCKET RECEIVE (winsock error 4294967286)

I have a 9.5.4 Relay (Win2008R2 VM) and many Clients connecting to it are showing this error in their BES client log files and are unable to post results to it (yes others Clients using it are able to).

The client logs on the non-working clients (also v9.5.4.38) look like this:

   At 23:41:00 +0000 - 
       Error posting report to: 'http://relay:52311/cgi-bin/bfenterprise/PostResults.exe' (General transport failure.
    SOCKET RECEIVE (winsock error 4294967286)
    At 23:46:03 +0000 - 
       FAILED to Synchronize - General transport failure. - SOCKET RECEIVE (winsock error 4294967286 - gather url - http://relay:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=http://root:52311/cgi-bin/bfgather.exe/actionsite&Time=28Nov23:46:03&rand=ed17bf6e&ManyVersionSha1=93d5664261c25e6a82fc1e71c501d80dfccb7df4
    At 23:46:04 +0000 - 

I’ve rebooted the Relay and have uninstalled the failing clients with no resolution.

There is one line of this in the BESRelay.log file but with only 1, I don’t think it is related.

Wed, 29 Nov 2017 06:09:37 -0600 - /cgi-bin/bfenterprise/clientregister.exe (2136) - Uncaught exception in plugin ClientRegister with client x.x.x.x: Socket Error: Windows Error 0x2745%: An established connection was aborted by the software in your host machine.

Is host machine the Client or the Relay and does anyone know what “winsock error 4294967286” points to or what else I can try?

thanks

Winsock errors in the logs typically relate to communication problems. Have you tried hitting the ‘actionsite’ url of this relay’s parent? Could a firewall, routing, proxy, or network change impacted its ability to communicate? Try telneting on the port to verify it is open.

Are other peer relays and clients working ok? Are they all experiencing the same issue?

the client can successfully telnet to the relay on 52311. Other clients trying to use this Relay are experiencing the same issue but not all clients using this Relay are.

If you point them to another relay do they work? Or not at all? One way to try is to temporarily turn down this relay, then cycle one of your clients. Does it start reporting correctly to another relay or root?

If it is limited to specific clients, are they having mailboxing issues?

If the issue persists when the clients try to connect to another known good relay, consider doing a client reset. (Stop client service and deleting these reg keys under \Bigfix\EnterpriseClient\GlobalOptions: ComputerID, RegCount, and ReportSequenceNumber. Delete the __BESData folder then restart client service.)

I uninstalled and reinstalled the Relay software and that appears to have resolved the problem, whatever it was. thanks

Next time this happens, or if other issues with your relay occur, you may want to first try a less invasive procedure to “reset” your relay before going the uninstall/reinstall route (which may still be needed in some instances):

See: How do I clean out and reset my BigFix relay machine?

We still get these errors but IBM support doesn’t say what this specific winsock error means. In this case the winsock error 4294967286 was reported in the client logs, we rebooted the client server and then the error went away on that client (for the time being), even through it selected the same relay post reboot. Can anyone provide any other troubleshooting steps? We’ve captured this with client debug enabled but it doesn’t provide any more info.

At 18:06:06 +0000 - 
   RegisterOnce: Attempting secure registration with 'https://relay1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=9.5.11.191&Body=1079789893&SequenceNumber=10&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&Root=http://root%3a52311&AdapterInfo=00-50-56-8b-52-8e_10.220.181.128%2f25_10.220.181.156_0&AdapterIpv6=00-50-56-8b-52-8e%5efe80%3a%3a34fa%3ababe%3a5a3d%3a91ce%2f64_0'
At 18:06:25 +0000 - 
   RegisterOnce: GetURL failed - General transport failure. - SOCKET RECEIVE (winsock error 4294967286 - registration url - http://relay1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=9.5.11.191&Body=1079789893&SequenceNumber=10&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&Root=http://root%3a52311&AdapterInfo=00-50-56-8b-52-8e_10.220.181.128%2f25_10.220.181.156_0&AdapterIpv6=00-50-56-8b-52-8e%5efe80%3a%3a34fa%3ababe%3a5a3d%3a91ce%2f64_0

.
Restarted computer
.

At 18:10:07 +0000 - 
   RegisterOnce: Attempting secure registration with 'https://relay1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=9.5.11.191&Body=1079789893&SequenceNumber=12&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&Root=http://root%3a52311&AdapterInfo=00-50-56-8b-52-8e_10.220.181.128%2f25_10.220.181.156_0&AdapterIpv6=00-50-56-8b-52-8e%5efe80%3a%3a34fa%3ababe%3a5a3d%3a91ce%2f64_0'
At 18:10:09 +0000 - 
   Unrestricted mode
   Configuring listener without wake-on-lan
   Registered with url 'https://relay1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=9.5.11.191&Body=1079789893&SequenceNumber=12&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&Root=http://root%3a52311&AdapterInfo=00-50-56-8b-52-8e_10.220.181.128%2f25_10.220.181.156_0&AdapterIpv6=00-50-56-8b-52-8e%5efe80%3a%3a34fa%3ababe%3a5a3d%3a91ce%2f64_0'
  Registration Server version 9.5.11.191 , Relay version 9.5.11.191
   Relay does not require authentication.
   Client has an AuthenticationCertificate
   Relay selected: relay1. at: 159.42.129.105:52311 on: IPV4 (Using setting IPV4ThenIPV6)

Per the Client log above, the winsock error in this case translates to ‘SOCKET_RECEIVE’: A connection was made to the BES Server/BES Relay, but an error occurred when receiving data.

As far as troubleshooting steps, it can be a bit difficult in such cases, but seeing as how a reboot addressed it, it seems there was some transient issue preventing proper communication. Potentially something on the endpoint itself rather than on the network…

Could L3 say what code logic goes into a winsock 4294967286 error?

We have also seen 4294967290 errors.

Customers need more tools and diagnostics to be able to get to root cause of these. Setting debug log to 10000 does not seem to be good enough.

Disclaimer: I’m not on the Dev team, just my opinion here.

So part of the issue there is that the Bigfix Client itself may not have insight into what is causing the problem. The fact that this is a Winsock error indicates a network error is being reported by Windows itself. Bigfix is just using the Windows Sockets API (“Winsock”) with a request to download a file from the relay, but Winsock is reporting an error.

http://www-01.ibm.com/support/docview.wss?uid=swg21505977 may be somewhat helpful, but first you have to be able to translate the numbering…I’ll post back again shortly from a computer.

Those short return codes are hard to translate from the longer winsock errors and even if we have “10 err_SOCKET_RECEIVE A connection was made to the TEM Server/TEM Relay, but an error occurred when receiving data” there is little to go on to resolve this.

To say its a winsock error is great and all but we need more to go to get to root cause. Restarting the machine obviously clears something up but we need to know what that something is. Any MS hotfixes anyone recalls for winsock errors with BES client?

How many clients are connected to the same relay? I’d look for errors in the Event Log on the relay, specifically anything calling out tcp/ip port exhaustion.

Also use ‘netstat -ano | find “52311”’ to see how many tcp ports are active from BigFix on the relay. If it’s in the range of 2k sockets, you might check the value of _BESRelay_HTTPServer_MaxConnections (which defaults to 2048 on Windows or only 512 on Linux), along with other “high-volume relay” recommendations from the Capacity and Planning guide.

I never did make it back to my computer, but the long numeric error code is a misrepresentation based on the client reporting an unsigned integer value. You could find a “two’s complement calculator” to translate that into an error code searchable from Winsock.

The relay in question has about 1,200 active clients. The relay is OEL 7.1 VM, 2CPU, 4GB RAM.

Looking at connections, “netstat -an | grep -i 52311 | wc” only returns anywhere from 80 to 200 so I wouldn’t think the OS has any port exhaustion problems.

the BESRelay.log file doesn’t show any errors from the time rangem where the client had the issue.
Mon, 27 May 2019 18:01:58 +0000 - 1834936064 - 403: 20NoMastheadMatchesURL
Mon, 27 May 2019 18:20:08 +0000 - 1834936064 - 189: 17NotASignedMessage

Ok, that all looks good. I’m out of suggestions…I think you said you had a PMR open right? I’d keep following-up with support.

We really need good direction from support and are challenged in this area. We have tried the 10000 debug log and sent event tracing data for winsock communications but we don’t get to root cause. A reboot of the server in some cases make the issue go away so that tells us it’s not network related but something corrupted on the client in the BES code or somewhere in the TCP stack. You can also see our frustration as this has been going on for almost 2 years.

Not sure whether it works in your case but here is what i did in my case when i received winsock "4294967290"error and upon further troubleshooting at the client side , the client wasn’t able to make connections with the FQDN of the bigfix/relay server and with IP address it works fine with no errors and i made a local host entry on the client and it works.

1 Like

Unfortunately, with 10’s of 1000’s of clients, that won’t be doable for us. The rotten thing is if you restart the machine, it contacts the relay just fine.

Instead of restarting the entire machine, do things resolve if you simply restart the Client service?

No, they typically don’t unless a different relay is auto selected but that rarely happens.