Issues with Relay Architecture

(imported topic written by MBARTOSH)

Is it just me or do others feel that the relay scheme in Bigfix is overly complicated and inadequate? I manage 680 relays across 600 different locations. I have to be very sensitive about creating traffic on the WAN since the company doesn’t have enough bandwidth in many locations for business traffic. We use only workstations for relays. Problems I have with the Bigfix relay scheme are:

  1. The six hour fail over rule is a problem because we quite often have relays down for more than 6 hours since they are workstation. If a relay is down then the client shouldn’t receive packages until the relay is back online.

  2. Automatic relay selection cannot be counted on since hop counts are unreliable. We have a location that is on the other side of the U.S. that is one hop count away which is the same as a workstation in my cube.

  3. The problem with relay affiliation is that it takes two evaluation cycles to apply the settings. In the mean time a huge package could start downloading across the WAN.

The solution:

Relays need to be defined by the subnets they service. This definition needs to be stored in the database, not a text file on the client.

Before a client starts to download package content it needs to be told what relay to use based on the subnet the client is in, and the configuration in the database.

Relays need to be protected which means that only machines in its defined subnets can get content from it and its clients cannot go anywhere else to get content.

This is a really simple concept that would really improve and simplify relay configuration.

One other thing is that there is no way of managing packages on relays. There is no way to know which packages are installed on which relays and whether or not those packages are complete.

If you wanted to get a competitive advantage, then add alerting for relays being down or moving subnets.

(imported comment written by BenKus)

Hi mbartosh,

Here are some comments:

  • You can change the 6 hour failover to a smaller value if you wish.
  • Can you give more info about what you mean by the two cycles for relay affiliation?
  • Agents will always find relays in their local subnet first. So if you have a lot of agents in a subnet, you can consider putting a relay in the same subnet.
  • If you want, you can create a mapping of subnet to relay (in fact, the “Location by Subnet” wizard can be modified to fit this purpose), but it would mean you would need to maintain the mapping.
  • You can consider making a dynamic bandwidth throttling or locking rule based on relay distance to avoid bandwidth issues if the agents are picking a bad relay: http://support.bigfix.com/cgi-bin/kbdirect.pl?id=367
  • In general, relay caches are supposed to be self-managing… but if you want to see them, does this help with cache management? Labs Cache Management
  • If you want to create an alert for down relays, you can use web reports to create a report (or an “alert”) if a relay hasn’t reported in a while (4 hours?)

Hope that helps,

Ben

(imported comment written by MBARTOSH)

BenKus,

  1. I don’t want the 6 hour rule to be smaller. I never want systems to fail over to the BES server or relays in other locations. I only want machines to get content from relays in the same location.

  2. We have code that sets the “Relay Seek List” and the “Relay Advertisement List”. This code has to run through two 15 minute cycles in order to set the values for a location. For example, a laptop user uses their laptop at home overnight on VPN, and then goes to a branch location the next day. When the user goes into the branch, their seek list will be set for the server from the night before, then it will take one cycle to determine relevance, and another to set the affiliation. In the mean time, the laptop could have already started receiving content from an action that was deployed overnight.

  3. We have a file we use for subnet mapping, but this is a text file. There should be an interface in Bigfix for the mapping, and it should be stored in the database. It is too risky editing a text file.

  4. Hop count to relays doesn’t work. We have locations in Alaska and Florida that are one hop away from the BES server in Southern California.

  5. Bandwidth throttling is good, but if there are 20 clients in a location on a 512kb circuit with one relay and the relay goes down, all 20 clients are going to start receiving the throttle amount which is too much. If there were no failover, that would solve the problem. Clients would not receive anything until the relay is back online. Then throttling would only be between the relay and the server.

  6. Cache Management is only for managing the size of the cache it doesn’t tell you what is cached and where it is cached. I don’t want to have to manage the cache size. We pick relays with enough free space. I need to know what is cached. For example, if I am deploying Office 2007, I want to know that it is cached on all relays before I start deploying it.

(imported comment written by NoahSalzman)

Regarding #5, please check out the Cache Management dashboard in the BigFix Labs site.