If the issue is not resolved by restarting the BigFix Client (but is by restarting the endpoint), it seems likely that the issue resides somewhere external to the Client, but on the endpoint itself.
When this issue occurs, prior to restarting the endpoint, have you performed network connectivity troubleshooting between the endpoint and the Relay in question? DNS resolution, ping, telnet on BigFix port, network packet captures, etc…?
The biggest issue we have is the inability to consistently repro this. We cannot afford to turn on 10000 debug client logging and have netmon tracing running on 10’s of 1000’s of clients. It can happen on one client then 5 minutes later, it clears up on same client with no restart. We can restart the client as previously mentioned and then issue goes away.
It’s one of the worst moving targets we have ever seen. We have had 2 cases open with IBM on this in the past and we simply gave up due to lack of root cause identification.
I wouldn’t suggest enabling debug logging on the Client to troubleshoot winsock errors as it will not give you additional information/details beyond the standard Client log. As suggested earlier, the winsock errors are indicative of a network communications issue that the Client encountered when attempting to perform a network operation.
If you’re seeing winsock errors in a Client log that clears up on its own without a restart of any kind, then for sure the issue is entirely external to the Client, and suggests intermittent network issues.
Some winsock errors (such as BAD_SERVERNAME - which is related to DNS resolution) might be addressed at the network level, or by different BigFix Client configurations. The rest tend to be network connectivity issues that need to be isolated when they occur, and can result from configurations/applications on the endpoint itself and/or network configurations/conditions.
I would suggest focusing troubleshooting efforts on specific instances, and testing network connectivity during these periods.
That’s what will be difficult. We don’t know which machine will act up next out of 10’s of 1000’s and we cannot afford to turn on debugging on 10’s of 1000’s of machines.
As Aram states, turning on additional BESClient debugging is not likely to help the situation, as it’s a transient problem likely related to your client or the network, and not to Bigfix itself.
You’ll need to be looking for other clues in the Event Logs of your server & relays, looking at your firewall/router logs, network switch logs, etc.
If left alone, does the client ever resolve itself without your intervention?
From your other post I saw there are ten network hops from your client to your relay; that is likely a case where I’d recommend locating a relay closer to your client, or enabling Persistent Connections from your client to the relay, depending on your network architecture.
What is your total deployment size, total number of relays, number of locations, number of endpoints per location?
You mentioned earlier you need better guidance from support…to be clear, this is not a support forum but a peer enthusiast / self-help forum. I don"t know whether anyone from the Support team reads the forum, and if so it’s just on their own time - responding on the forum is not anyone’s job. For actual support, you’ll need to work with them through the PMR / Support Requests process. I’m honestly not sure how much that’ll help, diagnosing what’s going on with your network might be beyond their scope; the winsock errors could possibly be useful from a developer standpoint, but often a Winsock error just means “I could not connect” which is less-than-helpful. The codewords that go with the number are often more useful than the number itself - SOCKET RECEIVE, or BAD SERVERNAME, etc.
BAD SERVERNAME usually means your DNS isn’t working properly. SOCKET RECEIVE usually means your network or operating system is dropping packets along the way. The first doesn’t usually fix itself, unless maybe you are seeing that while rebooting DNS servers; the second usually resolves itself with retries in time.
I don’t want to hijack this thread since I have one going on SOCKET RECEIVE (winsock error 4294967286) - #20 by Aram but all of your commands check out. Since my side has been trying to resolve this for almost 2 years, we have been down your road of those commands over and over to no resolution.
Some suggestions (IBM/HCL, you listening?), if a GetURL is failing to register due to winsock errors, try a different relay.
My current winsock issue is winsock error 4294967286 but we get the winsock error 4294967290 as well. If we ever fix one, we most likely will fix both. Message from log is At 16:59:23 +0000 -
Error posting report to: ‘http://xxxxxxxxx.thomsonreuters.com:52311/cgi-bin/bfenterprise/PostResults.exe’ (General transport failure.
SOCKET RECEIVE (winsock error 4294967286)
Relay select 0 vs 1 is “manual relay select” vs “automatic select”. It should have no bearing as this does not appear to be occurring during relay selection.
Is it normal to have ten network hops between the client and the relay? That’s a lot of hops, and a case where you might want to distribute a relay closer to the client.
What kind of logical distribution do you have (number of clients, number of relays, distance from client to relay)? Many customers who have the model of many small, distributed sites (ie mall storefronts) will add a relay to one of the workstations at each location.
Have you looked at the Persistent Connection options available as of 9.5.11 or 9.5.12 ?
Finally I got the solution for this, My solution is checking the baseline policy for the OS and check if the TLS is enable or blocked . If blocked try to enable it. you can test run by turn off baseline policy and then install the client. Thank you
@Krit Can you provide any more details around your TLS issue?
I have a client showing the error FAILED to Synchronize - General transport failure. - SOCKET RECEIVE (winsock error 4294967286- gather url
but it is also showing this error upon initial client registation: Failed automatic client authentication key exchange with server message: SSL protocol not supported.
I know this is an older thread but I wanted to mention I have seen these error too and it turned out to be iBoss blocking the traffic. Content filters are not our friends in these cases.
I know… so many years passed…
My reason for the same EXACT error was… IP conflict with another machine… We had 2 machines with the same IP address.
As soon as iP was changed to automatic, problem disappeared.