BigFix Relay capacity - 5,000

For our dedicated “leaf node” Relay servers, I’ve always had a hard max set of 2,000 clients / Relay. With the new Relay capacity in 9.5 of <5000 cilents / Relay, I’d like to bump up my set max values.

The Capacity Guide ( - https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/90553c0b-42eb-4df0-9556-d3c2e0ac4c52/page/0f46e90f-1a98-43d3-8f78-a6186ce3b9d3/attachment/787d37fc-9f6e-4aa6-9a7d-98f78a3ffee3/media/BigFix%20Capacity%20Planning%20v9.x.17.pdf
) gives these suggests for hardware specs for “high capacity” relays.

So for example, if I have a 2CPU / 4GB RAM VM (win or LInux) It should be able to handle ~3000 clients and a 4x8 could handle 5000, and no other configuration is needed?

Negative on the “no other configuration is needed”. I’m pretty sure the capacity guide you linked has something in it, and there might also be some release notes in 9.5.9 or 9.5.10 related to it, but… I think there are OS tuning that needs to be applied (setting TcpTimedWaitDelay to a shorter value to reuse ports more quickly) and the BESRelay setting for HTTPMaxConnections.

1 Like

This is how I net out the requirements for large-scale (5k) relays from the capacity planning guide information and my own recommendations:

  • 2-4 CPU cores
  • 4-8GB RAM
  • More storage (more varied sites, mailboxsites hosted)
  • Increase _BESRelay_HTTPServer_MaxConnections to 10,240
  • Adjust _Enterprise Server_ClientRegister_BatchCount/BatchDelay, for responsiveness needs
  • Windows: Reduce TcpTimedWaitDelay to 30
  • Linux: Increase nofiles ulimit to at least 16384
4 Likes

And where would the TCL fit into this? One could assume that by installing the Tiny Core Linux (.iso) from the download page and configuring a relay based on those instructions, would be all that is needed. True?

How does _Enterprise Server_ClientRegister_BatchCount have no default setting? Surely a relay sends UDP pings. I’ve read the description here, but perhaps I’m not understanding.

1 Like

That is a bug in doc. The default BatchCount is 100, so a relay serving 1000 endpoints would notify 100 each second (default BatchDelay) for 10 seconds. If the relay serves 5000 endpoints, it would notify them over 50 seconds, by default. Depending on what activity is being triggered by the notification (e.g. site gather vs fast query vs large download), that rate could be much faster than the relay can serve. So it makes sense to me to increase the delay and/or lower the count a bit to minimize requests by endpoints that we don’t expect to be serviceable.

Unfortunately, there is no single answer for best values to support 5k, so you may have to try a few options and monitor to see how endpoints reporting to that relay respond to various requests. Logical options to try are:

  • Default values
  • Increase BatchDelay to 2000
  • Decrease Batch Count to 50 and increase BatchDelay to 2000

I’ll have to double check if TCL has been validated for 5k support yet. I know that it wasn’t as part of the initial announcement, but was planned to be validated soon after.

thanks for the info. It’d be nice if there was a fixlet in the BES Support site that set these Relay configurations:

Increase _BESRelay_HTTPServer_MaxConnections to 10,240
Adjust _Enterprise Server_ClientRegister_BatchCount/BatchDelay, for responsiveness needs
Windows: Reduce TcpTimedWaitDelay to 30
Linux: Increase nofiles ulimit to at least 16384

3 Likes

What would be the suggestion if we have 150K endpoints for Batch Count & Batch Delay.

I searched the webpages related to it and found there is many numbers related to ulimit hard/soft on linux/unix machines so getting more confused.

Or anyone can guide us with command which needs to be executed with defined ulimit numbers in relation to scale Relay upto 5K.

How we can do this, is there any task available in BigFix or we have to do it in traditional way.

These settings just affect an individual relay, so the number of total endpoints does not have any impact. I’d suggest trying the values I mentioned above.

There is no task in BigFix to do this, but you can certainly use BigFix to change these values in a custom task. Generally, both hard and soft ulimits need to be changed, as the hard limit must be greater than or equal to the soft limit in order for the soft limit to be honored.

Thanks Steve !

But I searched internet related to ulimit things, there are 3 type of limit, soft, hard & unlimited. Where soft/hard limit is for per user base, cant we set it system wide since our client interact with system.

I am still bit confuse, please help me with more clarity.

It’s different across different platforms, but here’s a thread in this forum that discusses how to do it for RHEL 7: Ulimit on RHEL 7 and BES 9.5 server

In general, there are hard limits and soft limits for all system limits. Soft limits can be changed by each user, and affect their particular shell/processes. Hard limits can only be changed by root and apply to all users of the system, meaning that soft limits cannot violate the hard limits. Unlimited is a value that you can change the hard and soft limits to.

Since our processes run as root, you need to change the soft limit as root, and possibly the hard limit (if it is not already unlimited or 16384+). The limit we care about is the ‘nofiles’ limit which refers to ‘number of files’ or file handles. A typical command to change this is

ulimit -n 16384

or

ulimit -n unlimited

Depending on the OS, you may need to add this command to a shell setup (or similar) script, or add the equivalent entry into the limits.conf file (e.g. root soft nofile unlimited).

2 Likes

There’s a fixlet for that!
https://bigfix.me/fixlet/details/26606

2 Likes

Nice. An addition to that would be

  1. Windows and RHEL7 logic in the same fixlet
  2. For RHEL, have the relevancy look for the ulimit config in filelimits.conf (not just that the file doesn’t exist) so it is only relevant if it needs to be set. Not sure what the relevancy would be for Windows.

Agreed… but to be honest that’s sort of a 1 time thing… and I just didn’t care enough. I only wrote that because I had 30 relays in our data center to set. I don’t care much about the 3k relays our in the field we have since they will never see over 2k clients.

Also I wrote it to inject other service names at the time of deploy, like web reports, bes server, bes relay, etc.

As for windows logic, what is a windows server? I didn’t know they made those! :nerd_face:

1 Like

I am bit confused this fixlet referring to file /etc/systemd/system/besrelay.service.d/filelimits.conf but when I am looking for hard and soft no file, that is in /etc/security/limit.conf

which one is correct & needs to be modified.

I worked on below for changing ulimit-

appendfile LIMITS_FILE=/etc/security/limits.conf
appendfile TEMP_FILE=/tmp/myTmpLimits_`hostname`.`date +%y%m%d%H%M`.txt
appendfile BKUP_FILE=/tmp/limits_`hostname`.`date +%y%m%d%H%M`.conf
appendfile 
appendfile ###First remove "End of file" using grep’s -v parameter, dump output to a temporary file.
appendfile cat $LIMITS_FILE | grep -v "End of file" > $TEMP_FILE
appendfile ### Append whatever you need to the temporary file.
appendfile echo "*                hard     nofile          16384" >> $TEMP_FILE
appendfile echo "*                soft     nofile          9132" >> $TEMP_FILE
appendfile ##REPLACE the End of File that we removed above.
appendfile echo "#End of file" >> $TEMP_FILE
appendfile ## Make a backup of the original
appendfile cp $LIMITS_FILE $BKUP_FILE
appendfile 
appendfile ## Put the new version of the file in place.
appendfile cp $TEMP_FILE $LIMITS_FILE

//move & modify appendfile to allow execution
move __appendfile "{(client folder of current site as string) & "/limits.sh"}
wait chmod 555 "{(client folder of current site as string) & "/limits.sh"}
wait chmod +x "{(client folder of current site as string) & "/limits.sh"}

//execute shell script as written
run /bin/sh "{(client folder of current site as string) & "/limits.sh"}"

also want to check how to set relevance part for hard/soft if its not in the file or not set to defined value.

It really depends on the OS. RHEL 7 and CentOS 7 use systemd to start services (systemctl command). RHEL 6 and CentOS 6 use init scripts. init scripts will always use the limits.conf file to set the limits, but sytemctl processes ignore the limits.conf file and require the overwrite file /etc/systemd/system/(servicename)/filelimits.conf file. Really it will load any file in that servicname folder.

Here’s the fun part. You can still use “service besrelay restart” on RHEL/CentOS 7 and when you do that, it will honor the limits set for your shell, which is where the limits.conf file come in. You should really set both to cover your butt.

What OS are you running?

2 Likes

So we have to keep up with both files ?

Our relays are on RHEL.

Still coming back to important question, what would be relevance to identify if both files having these settings are not and most importantly what I have seen untill unless you dont restart linux box post making changes in limits.conf file it would not be updated so Is there any option if someone has implemented the settings but not restarted the box.

What I am thinking for putting pending restart status for servers where someone implemented theses settings and forget to restart so status will keep showing as pending restart.

Suggestions pls.