Relay Failover behavior

Considering:


_BESClient_RelaySelect_FailoverRelay

This setting determines what the BigFix Client will do in the event that no BigFix Relays respond to TTL pings up to the maximum configured distance. In this event, the BigFix Client will attempt to register with the defined failover BigFix Relay before trying the BigFix Server. This setting was first introduced in BigFix 5.1.

The documentation for _BESClient_RelaySelect_FailoverRelay also points to
KB0023371: What manual Relay selection options do I have for my clients and Relays?
which includes this list:

The manual relay selection process use these settings in the following order:

  1. Client attempts to connect to primary relay selection value (__RelayServer1) if set.
  2. Client attempts to connect to secondary selection value (__RelayServer2) if set.
  3. Client attempts to connect to tertiary selection if (_BESClient_RelaySelect_TertiaryRelayList) is set.
  4. Client attempts to connect failover selection if (_BESClient_RelaySelect_FailoverRelay) is set.
  5. Client fails to connect to relays Attempt root server selection .

_BESClient_RelaySelect_FailoverRelayList

This setting contains a list of failover relays to choose from when no relay listed as primary, secondary or specified in the tertiary list responded to pings. This setting, first introduced in BigFix 9.0, is a semi-colon delimited list of relays to try. For automatic relay selection, see Relay Affiliation. If specified, this setting overrides _BESClient_RelaySelect_FailoverRelay. (Example: relay1.company.com;192.168.123.32;relay2.company.com)

Meanwhile, not directly related to failover, but other settings are discussed in:
KB0022489: How can I control client relay selection for a particular Relay or DSA Server? (These are discussed on posts this forum. As of this writing, they are not documented in the List of settings and detailed descriptions. ).

  • _BESRelay_Selection_AutoSelectableRelay
  • _BESRelay_Selection_RelayPriority
  • _BESRelay_Selection_RelayWeight

So, some questions.

  1. The text for _BESClient_RelaySelect_FailoverRelay and KB0023371 imply that these options are for manual relay assignment. Is that accurate?
  2. The text _BESClient_RelaySelect_FailoverRelayList implies that it relates mostly to automatic relay affiliation. Is that accurate?
  3. Aside from having a multivalue list, does _BESClient_RelaySelect_FailoverRelayList add other distinct functional value vs _BESClient_RelaySelect_FailoverRelay?
  4. Both make reference to being invoked in the event of failed ping attempts.
    4.1 Does that mean that either/both take precedence over attempts to connect back to the core server?
    4.2 In regard to dissuading from connecting to the root server, do clients behave differently with either setting?
  5. Finally, can one use _BESClient_RelaySelect_FailoverRelayList with only one relay value, and essentially ignore _BESClient_RelaySelect_FailoverRelay?

(deep thoughts on a Thursday evening…)

The FailoverRelay and FailoverRelayList settings are functionally the same (note the differences in the value format though - FailoverRelay value includes the “/bfmirror/downloads” while FailoverRelayList does not).

FailoverRelay and FailoverRelayList will come after the normal automatic or manual relay selection fails, and before the root server is attempted. Unlike the normal relay selection, the FailoverRelay settings do not attempt to ping the relays, they just try a TCP connect to each, wait for that to connect or timeout, and then try the next one.

After all FailoverRelay entries are tried, the client will finally attempt to connect to the root, or to the “Last Failover Relay” option in the masthead file (applied on BESAdmin).

If the Last Failover Relay is set in the masthead, and the client fails to connect, it stops there without attempting to connect to the root. “Last Failover Relay” is a replacement of the root server attempt, not “in addition to” the root.

FailoverRelayList client setting is totally valid to have only one entry.

Note that with FailoverRelayList, there is no attempt to randomize or to find the “closest” relay. If using that as a normal selection method for clients, instead of as a “last effort”, you’d want to either randomize the value when you set it on clients, or have a DNS round-robin the resolution or something to balance clients; otherwise all the clients end up connecting to the first entry in the list and leave the others unused.

6 Likes

Thank you, @JasonWalker . This is exactly the flavor of discussion/answer I was hoping for! :heart:

@bigfix.mark, drawing your attention here. As relay management is a complex, multifaceted topic, it would behoove HCL to imbue this kind of holistic, theory-of-operation-based into documentation and overall mechanisms for relay management.

@atlauren agreed will be opening some internal work items to address.

2 Likes