Relay MaxChildCount Issue

I have added the below client setting to a relay.

_Enterprise Server_ClientRegister_MaxChildCount = 100

Clients have automatic relay selection in our environment and I noticed that this relay which has maxchildcound as 100 still accepts client to it and now the count is at 150 and rising.

Am i missing something? or should i do something apart from this setting?

This setting only applies to Clients that have registered to the given Relay within the last 24 hours.

Note that in general, I recommend avoiding the use of this approach to limit the endpoints registering with a given Relay as it can lead to unexpected behavior given the time requirement described above. There are many different configuration settings and strategies that can be leveraged to ensure that the appropriate Clients are registering with the appropriate Relays.

Thanks @Aram for the clarification. When you get time can you suggest some best approach that can be implemented on the clients to chose the closest relay?

Relay Affiliation ( https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Endpoint%20Manager/page/Relay%20Affiliation ) is among the better methods to guide and better control the automatic Relay selection process, and can usually be adapted to meet various requirements.

Could you provide a bit more detail around what you are attempting to achieve and/or the challenges you are running into so that we might provide more specific suggestions?

@Aram I will look into the relay affiliation. Here is the scenario, we have 1 Root Server and 2 TLR and 50 SLR. there are over 20k endpoints connecting to the 50 SLR and this was done manually over a period of time.

Now, we are planning to automate the relay selection process to make sure all endpoints points to its nearest relays and also to make sure that these endpoints should not connect to the 2 TLR. they should all be mapped to the 50 SLR.

So, how do it do it in the best possible way?

TLR - Top Level Relays
SLR - Sec. Level relays

Here are some high level thoughts:

  • the Relays themselves should leverage manual Relay selection - the SLRs would have assigned primary/secondary TLRs
  • the TLRs would be ā€˜excludedā€™ from the automatic selection process by leveraging Affiliation, and configuring their advertisement list to not include *
  • the Clients would have appropriate seeklists configured to ensure they select from the appropriate list of SLRs, and subsequently failovers if none of the SLRs are available. This might be as simple as including all the SLRs in the seeklist, or can be more granular depending on your network and requirements
2 Likes

Iā€™ll work on this and will get back to you for any other queries related to this.
Thanks again for your support.

1 Like

@Aram, what about using

_BESRelay_Selection_AutoSelectableRelay

On the TLRā€™s to prevent Endpoints from selecting them?

2 Likes

Sure, that can be done too! If Relay Affiliation is already being leveraged however, and the TLRā€™s advertisement list doesnā€™t contain *, then the _BESRelay_Selection_AutoSelectableRelay setting essentially becomes redundant.

1 Like

Nice find @TimRice! I have added this setting to the TLR.
But Iā€™m also looking for the best way to do it through relay affiliation!

The official list of client settings is available from IBM.

@Aram, Iā€™m more a ā€œBelt & Suspendersā€ kind of Admin. :smile:

1 Like

Sorry to Dig this one up again, but this settingā€¦
Which is correct ?

_Enterprise Server_ClientRegister_MaxChildCount

OR _Enterprise ServerClientRegister_MaxChildCount

Im seeing both mentioned, and am using the former set to 1000 but have 1200 and 1500 clients on two relays that are set to 1000

The setting is:

_Enterprise Server_ClientRegister_MaxChildCount

As suggested earlier ( Relay MaxChildCount Issue ), this setting is based on the number of clients that have registered to the given Relay within the last 24 hours. As such, it is possible to see more devices report that they are connecting to the given Relay (if they are offline for instance) than the value of this setting.

I generally recommend against leveraging this setting as it can lead to unexpected (and even undesirable) behavior, and would suggest instead a number of other Relay selection/configuration strategies.

1 Like

Just FYI, there is no reason to limit a relay to exactly 1000 endpoints if the relay is dedicated to the task and has enough resources. Windows Relays have some default OS settings that mean that more than 2048 endpoints could be a problem without adjusting those settings, and even then Iā€™ve seen 3000 endpoints on a single windows relay have no significant issues. Linux based Relays donā€™t have the same limitation and could have many more endpoints connected at once.

Thanks allā€¦ I see the underscore got removed in the post hence the confusionā€¦
I was trying to reduce the load on two of the relays, but as they are dedicated and have some decent horsepower, I wont worry too muchā€¦
I have restricted by adjusting the firewall to block a specific subnet in the past tooā€¦ that worked pretty well

1 Like

If you have a relay that is on dedicated hardware, then generally the only limitation is the number of simultaneous TCP connections that the OS can handle and not the hardware, especially if using SSD storage. The other bottleneck could be the network card if it only has 1 gig and you have tons of downloads going at once across the LAN, but that can be solved with a 10gig NIC.

Relays donā€™t tend to need a lot of RAM or CPU. I think 4 cores and 8GB of RAM is more than needed in most cases. A lot of BigFix processing tends to be single threaded so you are usually better off with fewer but faster CPU cores than many slower cores.

I like to see fairly large relay caches for top level relays and relays behind slower WAN links. I also think that consumer level SSDs like the Samsung 850 Pro work well since the caches tend to be write once read many. You can get a good size SSD these days for fairly low cost, sometimes cheaper than 15k SCSI disks.

This thread is very helpful.

Could someone please explain, at what frequency Relay checks inactive connections and drop them to take more connection?

For instance if _Enterprise Server_ClientRegister_MaxChildCount = 1000 and all connections are active now. and after 30 minutes 300 workstations went offline. How does Relay decide to dump those 300 workstations and allows 300 fresh to connect to it?

The timing calculation is based on when a given Client registers with the Relay in question. A given registration is counted towards the MaxChildCount so long as it was within the last 24 hours.

1 Like