Top-Level Relays

jmaple · June 2, 2015, 4:16pm

I’m trying to create a couple of top-level relays within our deployment and I want all other relays to report through their parents instead of reporting directly to the core. What is the setting that tells the relay application who its parent relay is?

jgstew · June 2, 2015, 4:26pm

It should be the BES Client relay selection settings.

If you are using automatic relay selection then the top level relays could advertise TopLevel affiliation group, and the sub-relays would use that affiliation group for their selection. No clients would use the TopLevel group.

jmaple · August 27, 2015, 4:16pm

I tried this but the setting “_BESRelay_PostResults_ParentRelayURL” doesn’t change. The client tries relay select but since it has the relay installed, it just reports to itself and that’s the behavior I was expecting with your recommendation but I figured it was worth a try.

jmaple · August 27, 2015, 4:28pm

There actually isn’t a lot of documentation I can find regarding relay hierarchy in terms of how relays can be configured to report to each other in a HA type of setting. I can tell my relays to directly report to one relay but if that relay is down, anything under that relay is not reported on as well.

My scenario is this: We have 4 separate environments with 2 relays placed at the top of each one (at least that is the goal). We have a couple sub-relays in those environments that I want to report to their respective environment top level relays. When that is completed, the top level relays should also report to another set of relays that will then send all traffic to the core. It seems there is no automatic way of doing this or configurable relay settings that allow for high availability or automatic failover. I wonder why not?

TimRice · August 28, 2015, 1:55pm

Take a look at the _BESClient_RelaySelect_TertiaryRelayList setting.

_BESClient_RelaySelect_TertiaryRelayList
When using manual relay selection, this setting is a way to specify a list of failover relays to choose from when the primary and secondary relays are not reachable. This setting is a semi-colon delimited list of relays to try. Manual selection goes in this order, primary/secondary/tertiary list/failover/root. For automatic relay selection you should look at the document on relay affiliation. (Example: relay1.company.com;192.168.123.32;relay2.company.com)

jmaple · August 28, 2015, 2:09pm

Does that work for relays choosing their parents or is that only for clients?

TimRice · August 28, 2015, 2:26pm

I believe it’s always the client choosing it’s Parent Relay, but that Clients with a Relay service installed are prevented from using Automatic Relay Selection to prevent Relay Loops, so you have to use the Primary/Secondary/Tertiary Relay settings and this is a new setting to allow “additional” relays to be defined.

I have not used this setting yet, but I noticed it a while back.

jmaple · August 28, 2015, 2:42pm

Well I ask about this because we are going to open up our external firewall and allow for management over the internet to our distributed laptops and we will be enabling encrypted reporting for our environment. We want to limit the relays that we distribute the private key to to keep decryption to only two of our relays which everything should report through. However, since, the other relays don’t also report to those two as parents, they will have to handle the decryption as well and I feel that’s a big footprint for our private key to be distributed to. I say this being in an environment with less than 4k machines.

jgstew · August 29, 2015, 3:40am

Could you diagram this or something? I’m having trouble following the flow. Is this just 1 root server?

I don’t have enough experience with lots of relays other than using them and seeing how they are set up.

I don’t understand why you can’t use relay auto selection for relays at one tier to choose the tier above them for as many tiers as their are.

CC: @rnkatpsu

jmaple · August 29, 2015, 4:05am

So in our environment we have one core server. Right now we have 20 relays that are servicing our 4k clients. These clients are separated by domain. Each domain has two top level relays that we want reporting to two others in our production environment which will be the only two passing any traffic back to the core or at least that’s the plan.

The automatic relay selection doesn’t work because that is meant only for clients. Since the client has the relay installed, it’s the closest so the client will not be assigned another relay. If I want to change the parent relay, it has to be done with the setting I mentioned before.

jgstew · August 29, 2015, 5:26am

Domain? Like AD? How many geographically separated locations?

With fast NICs and dedicated machines/VMs you should be able to handle 4k clients with 2 relays if there weren’t any WAN/LAN constraints.

Are your Relays VMs? Server hardware? Non-server hardware? Are they used for anything else?

I clearly don’t get how relays select other relays. I’ll have to dig into that.

You could have the BESClient set the relay that the relay should use dynamically and have it switch if there is a failure. As long as it is a policy action it will work.

TimRice · August 30, 2015, 2:28am

Quoting from [HERE under BigFix Relay Behavior][1]…

BigFix Relays themselves do not use automatic relay selection when deciding which parent BigFix Relay or BigFix Server to use so the BigFix Relay affiliation process does not apply when BigFix Relays pick their parent. BigFix Relays will use the standard manual relay selection and failover behavior.

You need to configure the Parent Relay using the Client Settings …

__RelayServer1
__RelayServer2
_BESClient_RelaySelect_TertiaryRelayList
_BESClient_RelaySelect_FailoverRelay or _BESClient_RelaySelect_FailoverRelayList

For a client outside the network, that will not likely be able to successfully make use of Automatic Relay Selection, you should configure _BESClient_RelaySelect_FailoverRelay. I currently have two DMZ Relays, and use a Task with the following Action to “Randomly” assign a client to a DMZ Relay (note there is no true RANDOM function in Relevance, so the ComputerID is as close as we can get) …

// This should result in a fairly even distribution between the two available DMZ Relay Servers.
setting "_BESClient_RelaySelect_FailoverRelay"=”{item 1 of item 1 of (computer id mod 2, (0,"http://dmz01.fqdu,edu:52311/bfmirror/downloads/"; 1,"http://dmz02.fqdu,edu:52311/bfmirror/downloads/")) whose (item 0 of it = item 0 of item 1 of it)}” on "{now}" for client```

  [1]: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Endpoint%20Manager/page/Relay%20Affiliation

jgstew · August 30, 2015, 4:59am

Thanks for that explanation.

I don’t think @jmaple 's primary concern was random assignment, but high availability.

How do you get relays using 1 top level relay switch to the other if the one they are using goes down?

It seems like that could be achieved by having even computerIDs set TopLevelRelay1 as their Relay1 and TopLevelRelay2 as their Relay2 and have odd numbered computerIDs set TopLevelRelay2 as their Relay1 and TopLevelRelay1 as their Relay2. This would provide both High Availability (failover) while also providing semi-random distribution.

jgstew · August 30, 2015, 5:04am

Also, you can get randomness through relevance, but you have to construct it from available sources, which is ugly and not cryptographically strong.

You can use the current time in seconds since epoch, number of lines of client log files, computerID, and potentially many other sources, mix them all together and then take the mod of them.

TimRice · August 30, 2015, 1:10pm

In our case High Availability comes from definining a Relay1, a Relay2, and a list of Tertiary Relays on each system that is running the Relay Service.

Beyond that, you have to monitor your environment. We use monitoring software to alert us when certain processes (aka BESRelay) are not running on Relay machines, or the machine is not responsive for several minutes (we try to set it to double the average time for a reboot). In our case a HelpDesk ticket is automatically generated and assigned to our group for remediation.

The relay fail-over paths are manually maintained. Putting a relay into production is not a hap hazard process.

About as close as we can get to Automatic Relay selection is the Tertiary List. On my off-site relays, we set this to include all of our Data Center relays, but it’s a manual thing. When we add a new Top Level Relay, we have to remember to Stop/Update/Redeploy the Action that sets the Tertiary Relay list for off-site Relays.

jmaple · August 31, 2015, 1:00pm

After testing your suggestion, that doesn’t seem to tell my relay service not to select the core server as its parent. I ran the relay auto select and restarted both services but the relay continues to select the core. I don’t think the “Automatic” selection method applies to parent relay selection since the setting is specifically for the BESClient.

jgstew · August 31, 2015, 3:57pm

I think you are correct about that, I was mistaken. @TimRice 's post quotes the document from IBM that explains this.

jmaple · August 31, 2015, 5:43pm

Well, I have a lab set up with IEM at version 9.2 and it looks like what @TimRice has described seems to be working. Maybe it’s something that version 9.1.1229 (our production version) cannot do?

TimRice · August 31, 2015, 7:38pm

@jmaple I don’t remember when IBM added the Tertiary Relay list function.
The failover from Relay1 to Relay2 has been in the product for as long as I have used it (circa 2003). It’s the reason both settings are there.

jmaple · August 31, 2015, 7:40pm

Well the funny thing is, I’m not using that setting. I simply did as @jgstew first suggested and made my top-level relay advertise a top level and had my subrelay seek it. It seems that was enough. Not sure why that doesn’t work on 9.1.1229.