Detected BES Relay Server Flow Flawed?

Yesterday I seen something that I’ve not yet come across. I’ve raised a support ticket for it but wanted to put it on here too on the off chance that someone has seen or heard of a similar issue and might be able to help.

Our Scenario (Linux Servers):

270 odd servers are showing as reporting to the root server directly in the health check dashboard

When I investigated those, they were all in the same geographical location I started troubleshooting to see why they weren’t reporting to their local relay I found the following:

The DC has no DNS thus everything is configured from the host file. When I checked the host file, there was an entry that pointed the url of the root server to the relay I expected it to be reporting to which allowed it to register and essentially work however with that being in place, nothing on BigFix reports the correct relay server for those devices, they all show as pointing to the root server.

Our root server is configured by host name so when the agent registers it sees that it is in location X and sets the seek list for the agent as location X then Tier 1 however as there is no DNS it can’t see any of those so it tries the root server which is configured to point to a relay in the host file and thus sets all the besclient.config settings to show the root server.

The real problem here is that now I’ve no idea what relay server that server is actually using as it’s not reporting anywhere and it’s giving me false positives on the fact we have so many servers pointing to root.

I’ve got a few ideas about how to fix this however it does show there is a flaw in how a servers relay is detected and suggests that maybe some cross checks should be added. I mean it probably works as designed but I believe the as designed is the flawed part.

Is this a flaw in Bigfix?

I read this as the client believes that it is reporting to <root_server> and reports that - it has no reason to suspect that <root_server> is actually <local_relay>.

If the clients were configured with a failover relay setting, then they would initially try to talk directly to the root, then failover to the local relay and correctly report that.

That’s the question I’m asking I guess.

Working the way it does means that I’ve no real way to know if servers are actually pointing to the root server or a relay server and even if I suspect it’s a relay I now can’t target actions by that relay because it’s not reporting all servers that ACTUALLY report to it.

The main problem is that the endpoint is reporting that it’s connected to a server it isn’t really connected to

Unfortunately, that Relay property is retrieved from what the client thinks its relay is - basically the value of the client setting '__RelayServer1'

It’s very common to “hide” the actual relay, with DNS or HOSTS file games (as you’ve seen), as well as with NAT, Load Balancers, Proxies, etc.

Using BigFix Query or a custom Analysis, see what 'name of selected server returns for some of these clients. My recollection is that this setting can also be fooled by DNS/HOSTS but I can’t verify at the moment.

I do have some relevance that will give the real relay name and relay computer ID, based on parsing Relay chain.txt on the client, that is not fooled by these name-games. I should be able to post that a bit later today.

1 Like

Actually, that didn’t take so long. I would use this in an Analysis, reporting no more frequently than “Every 15 minutes”, since it parses through all of the RelayChain.txt files on the client and can be a bit expensive to calculate.

tuple string items (number of tuple string items of it - 1) of tuple string of tuple string items whose (it starts with "r:" or it starts with "s:") of tuple string of substrings separated by " - " of following text of last "|||" of ("|||" & it ) of concatenation "|||" of lines containing " - S - " of files of folders "RelayChain" of folders "__Global" of data folders of clients

Results are return in the from of

s:<computerid>(computername)
r:<computerid>(computername)

where “s:” indicates the client is connected directly to the root server, and “r:” indicates the client is connected to a Relay.

The computername returned is the real name of the relay/server, as the client on that relay/server resolves itself - so it should match the computer name that’s displayed in the console. I.E. the root server’s real hostname should be displayed here, not whatever CNAME alias the masthead file may be configured to use.

image

3 Likes

Oh, I should also mention that ip address of selected server is also valid for your scenario, where the name but not IP address is being overridden.

I wrote the more complicated relevance above specifically to resolve relays that were behind WAN load balancers, where several different relays could masquerade behind a single IP address.

2 Likes