Client not falling back to main URL when relay is not available

D.Dean · June 21, 2022, 11:58pm

We are an MSP with many customers. We have some customers that image their systems and then ship them out to fully remote users. During the install we have a script that will build the clientsettings.cfg on the fly with know reachable relays so that the client can connect and register.

When the remote user gets the system the client is trying to use the the two internal relays that are now unreachable. Usually when this occurs, the client will connect back to the main URL of our root server (From __Relay_Control_RootServer value). This main URL resolves differently from inside the networks than it does from outside the networks. From Outside, it will resolve to one of four internet facing relays. Usually the system has the private key (cert) to communicate with these authenticating relays and we have no issues.

Today however, I have two customers in which the clients are looking for the two relays from the config and not able to reach them and are NOT defaulting back to the __Relay_Control_RootServer root server value. This caused them to go offline and eventually be removed by the computer cleanup process of the BES admin tool.

Why would a system keep trying to use relays it does not have access to and not default back to the system listed in __Relay_Control_RootServer?

The logs does show winsock errors for the internal relays which is expected because they are not able to resolve or each them.

Please help

JasonWalker · June 22, 2022, 12:21am

My first couple of guesses…

You said the masthead name resolves differently for internal vs external clients. Is that a public DNS name, or does it depend on customer provided DNS aliasing? Is it possible these clients are not resolving the masthead name at all?
Have you applied the “Last Fallback Relay” masthead option, using BESAdminTool? If so, that last fallback Relay value replaces using the masthead root name.

If you can access the machines at all, I’d recommend adding _BESClient_RelaySelect_FailoverRelayList values, since that allows adding multiple relays (Internal;VPN;DMZ;Public). They are tried in the order specified, after Automatic/Manual Relay Select fails, and before trying the “Last Fallback Relay” or masthead name.

D.Dean · June 22, 2022, 1:05am

If you ping the same URL from inside and outside, the resolve to different IPs, internal to an internal IP and external to an internet facing relay. This has worked very well.

However, I now know, thanks to you, the issue is the Last Fallback Relay. Yes, we set it and that relay is internal, happens to be the internal relay these systems are sitting at trying to connect. It is an IP address, not a URL.

So my last question would be, is there a reg key we can change on the endpoints to remove the last fallback relay without requiring them to check in and get it from the console?

It would be difficult to connect to 300+ machines manually to get them to check in somehow. They can use something like inTune to deploy a key change though.

Is there some way to get the besclient.exe -register to work? When we tried it the return was that the certificate already existed so the relay rejected the registration.

JasonWalker · June 22, 2022, 2:14am

You could deploy the reg key for the BESClient setting to apply _BESClient_RelaySelect_FailoverRelayList client setting. That should be effective the next time the client goes through relay selection (which is probably at the maximum 2 hour failure interval now) or when the BESClient service restarts.

You could even use that to put in multiple values - like your ‘last fallback relay’ followed by the masthead name that resolves to the two internal/external addresses, so clients that can reach the fallback relay will prefer it and then try the masthead name of it’s unreachable.

Give that a test

JasonWalker · June 22, 2022, 2:17am

The client settings on Windows are controlled through the registry - at HKLM\Software\Wow6432Node\BigFix\EnterpriseClient\Settings\Client