Top-Level Relays

In our case High Availability comes from definining a Relay1, a Relay2, and a list of Tertiary Relays on each system that is running the Relay Service.

Beyond that, you have to monitor your environment. We use monitoring software to alert us when certain processes (aka BESRelay) are not running on Relay machines, or the machine is not responsive for several minutes (we try to set it to double the average time for a reboot). In our case a HelpDesk ticket is automatically generated and assigned to our group for remediation.

The relay fail-over paths are manually maintained. Putting a relay into production is not a hap hazard process.

About as close as we can get to Automatic Relay selection is the Tertiary List. On my off-site relays, we set this to include all of our Data Center relays, but it’s a manual thing. When we add a new Top Level Relay, we have to remember to Stop/Update/Redeploy the Action that sets the Tertiary Relay list for off-site Relays.

After testing your suggestion, that doesn’t seem to tell my relay service not to select the core server as its parent. I ran the relay auto select and restarted both services but the relay continues to select the core. I don’t think the “Automatic” selection method applies to parent relay selection since the setting is specifically for the BESClient.

1 Like

I think you are correct about that, I was mistaken. @TimRice 's post quotes the document from IBM that explains this.

Well, I have a lab set up with IEM at version 9.2 and it looks like what @TimRice has described seems to be working. Maybe it’s something that version 9.1.1229 (our production version) cannot do?

@jmaple I don’t remember when IBM added the Tertiary Relay list function.
The failover from Relay1 to Relay2 has been in the product for as long as I have used it (circa 2003). It’s the reason both settings are there.

Well the funny thing is, I’m not using that setting. I simply did as @jgstew first suggested and made my top-level relay advertise a top level and had my subrelay seek it. It seems that was enough. Not sure why that doesn’t work on 9.1.1229.

1 Like

Now I’m even more confused.

It always made sense to me that relays selecting other relays could work exactly the same way as clients selecting relays, it just could be a problem if it can’t find any relays in the affiliation group and then failed over to a random one, because then you could have relays in a never ending circle.

If I recall correctly, a Relay will never use Automatic Relay Selection. Tim was correct in his earlier post that you need to configure these settings:

__RelayServer1
__RelayServer2
_BESClient_RelaySelect_TertiaryRelayList
_BESClient_RelaySelect_FailoverRelay or _BESClient_RelaySelect_FailoverRelayList

…except that these need to be configured on every relay.

__RelayServer1 and __RelayServer2

can be configured by right-clicking on Relay’s computer account in the console and selecting “Properties” or “Edit Settings”. But I don’t think that dialog has places to put in the TertiaryRelayList nor the FailoverRelay / FailoverRelayList settings, you’ll need to use the fixlets from the BES Support site or your own Task to configure those.

I’m not sure whether RelaySelect_Automatic needs to be set to 0; I think that setting is ignored on Relays anyway.

See https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Endpoint%20Manager/page/Relay%20Affiliation
BigFix Relay Behavior

BigFix Relays themselves do not use automatic relay selection when deciding which parent BigFix Relay or BigFix Server to use so the BigFix Relay affiliation process does not apply when BigFix Relays pick their parent. BigFix Relays will use the standard manual relay selection and failover behavior.

Also, according to https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Tivoli%20Endpoint%20Manager/page/Autoselection%20Failsafe%20Controls ,the child relay must get a reply to ICMP requests to the parent relay in order to select them (I didn’t realize that was a requirement with Manual Relay Selection so I thought I’d point that out).

According to
https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/Tivoli%20Endpoint%20Manager/page/Configuration%20Settings
if the ICMP from child relay to parent relay is blocked, you could use _BESClient_RelaySelect_FailoverRelayList if your relays are 9.0+ or _BESClient_RelaySelect_FailoverRelay at version 5.1+.

It appears that only the FailoverRelay settings will achieve Relay selection without ICMP replies.

But that’s not what I’ve just experienced in my lab. It may have been because the client of the sub-level relay was pointed at the top level relay BEFORE installing the relay on the sub-level relay client. Since I did that, the relay now uses the top level relay as the parent of the sub-level relay which is what I want to happen in our production environment. There really isn’t an issue with ping in our environment especially since the name of the relay is the FQDN.

The problem I have is I assigned the two top level relays manually before but the parent relay didn’t change no matter how many times I restarted the client or ran the manual selection. I couldn’t get clients (even clients sitting on the same subnet) to take the relay as a parent using manual selection

Is there a host-based firewall, perhaps blocking the ICMP on the relay? I’d suggest running a network capture from the child relay during a relay selection.

In any case, in my “bag of dirty tricks”, I’ve had situations when cloning machines with a preinstalled copy of the BES client, where the target machine cannot reach any of the relays that existed at the time the image was created; so it cannot learn about the newer relays installed at the client site. In that situation, I’ve updated the HOSTS file on the client so that it thinks the root server has the IP address of the nearest relay. You might try modifying the client hosts file (or even your DNS infrastructure) to assign the root server name(s) to the top-level relay IP addresses.

I think I recall @jgstew posting something similar here…was it called a shadow root? Only your Console machines and top-level relays need to correctly resolve the root servers, so those could have “correct” HOSTS file entries, to override the “false” DNS results that are used by the rest of the infrastructure.

Fake Root is what it has been called, but I suppose there is no official term.

You can do this also if you have control over the DNS of the machines at the remote location. You can just set the DNS they use to resolve the root server as a local relay. You probably could even use DNS load balancing to have the DNS resolve the hostname of the local relays randomly instead of always returning the same IP, though it would need to return the same IP for the same connecting client for things to really work well.

The “Fake Root” concept goes even further and has the root DNS name ALWAYS resolve to a single top level relay instead of the root server. This allows you to more easily restrict access to the root server. It also allows you to keep load off of the root server and put as much load as possible on this single top level relay that can be swapped out much more easily than the root server. You can put a lot of storage on the cache of this top level relay and set it to do internet downloads so that they are not done by the root and reduce the storage needs of the root server. You can also have this top level relay be the one responsible for decrypting reports. You could set it to hold onto more reports for longer so that it can send them up to the root in larger batches to further reduce load on the root server.

Looks like its time for me to add some background info.

Relays do not perform any relay selection. The accompanying client performs this work for it. The client will obviously use the relay it is installed with but does find the parent that the relay talks to and instructs the relay to talk to that parent. By default, auto relay selection is NOT enabled when a client has a relay attached. There is another setting to do that but it isn’t something we recommend so I’m not going to mention the setting.

If you look in the logs of the recent clients (like 9.2) you will see the relay selection of the client show that it is using the localhost but has chosen its parent.

So, there are still cases where the client of an installed relay will fail over to root but it has to go through all these first, and as the comment of one of the settings states, Manual selection goes in this order, primary/secondary/tertiary list/failover/root

  _BESClient_RelaySelect_TertiaryRelayList 
 Type:  String 
 Version:  7.0 
 Platform:  All 
 Default:   
 Requires Client Restart:  NO 
 Description:  semi-colon delimited list of relays to try.  
Manual selection goes in this order, primary/secondary/tertiary list/failover/root 

  _BESClient_RelaySelect_FailoverRelay 
 Type:  String 
 Version:  5.1 
 Platform:  All 
 Default:   
 Requires Client Restart:  NO 
 Description:  failover relay used if configured and nobody is responding to pings. Like __RelayServer1, it should be of the form http://server:52311/cgi-bin/download 


  _BESClient_RelaySelect_FailoverRelayList 
 Type:  String 
 Version:  9.0 
 Platform:  All 
 Default:   
 Requires Client Restart:  NO 
 Description:  A semicolon delimited list of failover relay names used if configured and nobody is responding to pings. If present and not empty, it replaces _BESClient_RelaySelect_FailoverRelay

You obviously can still end up with cases where a client will chose the root as its parent, and a Fake Root (or impersonating root or however you want to call it) as @jgstew mentions is the only way to prevent that. A top level relay masquerading to everyone but consoles and that relay as the server is the only real way of preventing clients from getting to the root either by failing over or by initial install.

2 Likes

Sorry I’m late to the table…

I’ve been using relay auto selection along with appropriate seek and advertisement list for over 8 years and have never had issues. I set the following settings on the relays.

_BESClient_Relay_EnableAutomaticSelection

  • This is the setting @AlanM is keeping secret (sorry). This tells the client with a relay to allow it to use automatic relay selection

_BESRelay_Register_Affiliation_AdvertisementList

  • This is what the relay advertises to clients or other relays

_BESClient_Register_Affiliation_SeekList

  • This is what the seek list the client and relays will use. I also set this setting on all the clients to specify their seeklist.
1 Like

Not trying to keep anything secret, but be aware when using this that in some circumstances you will end up with circular relay selections. The client does its best to detect this but if you have relay C pointing to relay B pointing to relay A pointing to relay C because they are all equally good relays… you can see the problem.

Again with any setting we have, they override default behaviour for certain circumstances and should be considered before using them

2 Likes

Hello Alan,
For the tertiary relay list trying, Is the connection tried as the order which is written ?
Or, is it connected randomly in relay of List which is written within tertiary list ?

@jmaple , are you just trying to set permanent parent relays to your subrelays? I’m not sure why you would want relays to be set to Automatic, this would cause adverse effects in my environment and having these set to manual is what is giving me control to the flow of the endpoint communication. We are using relay affiliation with the Seeklist and Advertisement settings as mentioned above, targeted for “All Computers” and we have our Advertisement lists based on which clients you want them to be able to communicate with directly. For the parent relay selection, we just edit the computer setting of the Relay, then set a PRI and BACK and call it a day. What is the need for auto relay selection for a relay in your environment? Also, as @jgstew was stating, why so may relays in such a small environment? I think the max is 1k clients per relay.

@heagsta: As @AlanM has stated, relays do not perform the automated selection like normal clients.

Our advertisement list is working like a charm so I’m not concerned with that so much. At this point my issue was resolved by forcing my relays to be manually set to their respective top-level relays. I was not part of the initial deployment of our infrastructure so we have quite a few relays where it’s not necessary and while I could get rid of half of them, everything is working and I don’t see a reason to rock the boat. I just wanted to be sure that when I enabled encryption, my top-level relays were the decryption relays and my main server didn’t have to decrypt reports.

1 Like

@jmaple, I understand that relays do not perform automatic relay selection, but throughout the thread, I thought you were looking for a more complex solution for a problem I wasn’t understanding fully, instead of just manually assigning the parent relays. Either way, at least it is working as advertised now. Good info on the decryption, I have not worked with that as of yet.

There are times when it makes sense to add a relay for reasons other than the number of clients connecting, like to help with UDP notifications and WAN bandwidth.

Let me put it this way; we have 20+ relays where 90% of the environment is sitting in the same physical building. I could stand to lose a couple.