Internet Relays in AWS challenges (hop count)

tsikma · May 16, 2016, 6:48pm

I am trying to migrate several portions of the BigFix infrastructure into AWS but running into a few different challenges. Currently we have 6 IFR within the US, 2 in the UK, and 2 in Asia. We are hoping to migrate the 6 in the US to our AWS instance which is in the US. We are configured to use Relay Affiliation and that portion seems to be working fine. The issue that I am having is that BigFix uses hop count instead of quickest response to determine the relay to use.

I tested from the Midwest of the US and currently 6 US servers have hope count of 13/14 based on location, UK has hop count of 16, and Asia has hop count of 18. The hop count to AWS is about 9 and then gets bounced around internal AWS for a total of 25-30 hops. When you actually look at the connection speed AWS is much faster but these relays are not getting selected because the high hop count.

If I make the 6 US relays offline, the clients are now going to UK/Asia instead of AWS. The end solution I would like a hybrid of the servers in UK/Asia and AWS where North America machines use AWS and Europe/Asia use the respective relays. I looked at the settings for weighting the relays but that would not fix the issues I am having. Is there a different setting that can be used to select the closest relay by distance and not hop count?

strawgate · May 16, 2016, 8:01pm

Relay Affiliation lists will help you with this: Legacy Communities - IBM TechXchange Community

Essentially you sort your relays into groups:

US_CORP
US_AWS
EMEA_CORP
EMEA_AWS
ASIA_CORP
ASIA_AWS

And then when your clients are doing automatic relay selection you can use “_BESClient_Register_Affiliation_SeekList” to have them only check the relays in their affiliation list. You can even stack the affiliation lists like: “US_AWS;US_CORP;EMEA_AWS;ASIA_AWS” so that if none of the US_AWS relays are online they will try US_CORP, if none of those are online they will try EMEA and so forth.

tsikma · May 16, 2016, 9:00pm

hmmm, makes for an interesting challenge. Based on our current process for assigning the SeekList to clients might require a rework of a large wall of relevance. This is where I was hoping the client could use distance to Relay instead of hop count.

I might be able to use the existing process and solve my problems by making the Advertising list be Americas;FAILOVERS;* instead of just FAILOVERS so that when a client that has Americas in the SeekList is on the Internet that it will try the Americas servers. The big problem is the SeekList changes on the clients based on IP of the client machine every hour, so once a machine is on the internet it defaults to FAILOVERS;* and not the full site;state;country;FAILOVER;* that it has when on the network

strawgate · May 17, 2016, 12:59am

Correct – you will not be able to change it to latency instead of hop count.

If you are trying to eliminate the local relays what I would probably do is re-advertise your relays so that the corporate relays only advertise at the country level (and then the hops will point clients to the right one in an AWS fail event) but advertise your AWS relays as being available at individual sites.

AWS Chicago could advertise: Milwaukee, Chicago, CIncinatti, Wisconsin, Illinois, Ohio, United States, FAILOVER
Branch Office Relay would only advertise Illinois or even United states and Failover
AWS Houston could advertise: Cities near houston, States near Texas, United States, Failover

Relay architecture like this takes time and if there is already a global policy dictating relay discovery you’re going to have a hard time getting around that.