Optimizing Communication for Internet-Facing Relays: Tips and Best Practices

adeilson · November 7, 2024, 3:02pm

Hello,

I’m seeking advice on optimizing the communication speed for our internet-facing relay setup.

Currently, we have a relay deployed in Azure that connects with our internal network’s parent relay. This Azure relay supports remote users by distributing necessary content. However, I’ve noticed that client machines often experience significant delays—sometimes up to an entire day—in receiving updates. Additionally, relay status updates to the console can take as long as six hours to reflect accurately, with the relay sometimes appearing as inactive during this period.

I understand that the relay is delivering content, but I’d like to reduce the communication intervals for a more responsive setup. I suspect that the Azure relay’s inability to use UDP ping might be contributing to these delays. Could this be the primary cause?

To achieve this, I’m aiming for the following:

Clients should receive updates in a maximum of 30 minutes.
Faster and more consistent communication between the Azure relay and our internal parent relay.
Improved synchronization between the parent relay and the Azure relay, ensuring timely delivery of content and status updates to clients.

Are there configurations I should adjust on the parent relay to improve communication flow? Any suggestions for tuning the settings to enhance the responsiveness of this relay setup would be greatly appreciated. I hope this discussion will also benefit others looking to optimize similar environments.

Thank you!

JasonWalker · November 7, 2024, 4:04pm

I would start by combining two distinct features -

Command Polling - to improve the ‘worst-case’ scenario.
Persistent Connections - to make the ‘best-case’ better.

Bear in mind that notifications from your Azure relay to clients would normally be sent over udp/52311, and will almost certainly be blocked from the client by your users’ Site Firewall/Router or home routers (due to NAT, and possibly firewall rules).

In contrast, notifications from your internal network Relay to the Azure relay will be sent over tcp/52311. So be sure that the TCP connection is open to the Azure relay as well.

These notifications are involved in telling the downstream relay or client that there is new content - a site has been updated, or an action has been issued, etc. - and the client should gather the updates.

When those notifications are blocked, your two main options are

Command Polling - described at Setting up internet relays
- On a regular schedule, the client or relay will ‘poll’ its parent to see whether there is new content to gather.
- Depending on your deployment size, you might set this to occur every 30 minutes or every hour. The more clients you have, the less frequently you should poll, to reduce workload on the parent relay.
- Configure this on both your clients, and on your Azure relay to poll its parent relay.
- _BESClient_Comm_CommandPollEnable = 1
- _BESClient_Comm_CommandPollIntervalSeconds = 1800
Persistent Connections - you can configure your clients (and, as of 11.0.3, your Relays as well) to establish a persistent TCP connection to their parent, and leave the connection open. Notifications are then sent over this open TCP connection instead of using new UDP or TCP connections downward.
Persistent connections
- On upstream Relays (both the Azure relay and your on-site relay):
  - _BESRelay_PersistentConnection_Enabled = 1
- On your downstream Clients:
  - _BESClient_PersistentConnection_Enabled = 1
- On your Azure Relay (to establish a persistent connection to its parent Relay, assuming both relays are at 11.0.3 or higher):
  - _BESRelay_PersistentConnection_OpenParent = 1

You may also need to tune the allowed number of persistent connections on your Azure relay, if it’s serving a large number of clients and they all need the persistent connections:

_BESRelay_PersistentConnection_MaxNumber
default is 100. If this device is dedicated to the Relay function you can increase that default. I don’t think we offer specific guidance as to “how many is too many”, you’ll need to monitor performance and reliability, but I would say that anywhere from 1000 to 5000 is likely to be ok.

It might also be useful to catalog the related settings we have available at List of settings and detailed descriptions

fermt · November 7, 2024, 4:07pm

Something that we have implemented for systems that use a DMZ relay and have a decent internet download speed is the usage of direct downloads for Microsoft Windows patches. Instead of downloading the content from the relay, the clients will directly download the content from Internet, when possible.
For responsiveness, have you enabled command polling at the clients level?
Since they are out of your enterprise network, they may not receive the udp notifications when using a DMZ/Internet Relay, causing delays to receive any type of content updates.

adeilson · November 7, 2024, 5:33pm

Thank you for the information, @JasonWalker

I’ve implemented the recommended steps and it’s looking better so far, but I have some questions about the persistent connection.

Many of our client computers work remotely, but some occasionally come into the office. It’s challenging to determine exactly who is always working from home and who has a hybrid schedule.

I enabled 30-minute command polling for all our laptop clients. However, I’m hesitant to enable persistent connections since we have around 500 laptops spread across multiple relays (12 in total). Our environment is relatively small, with about 2,500 computers, so we have more than enough relays to meet network demand and deliver content efficiently.

If I enable persistent connections, would that mean that when a client connects to our internal network and selects an internal relay, it will also establish a persistent connection with that internal relay?

We have about 160 laptops that connect to our DMZ relay, for which establishing a persistent connection would be fine. My concern is whether this will also apply to our internal laptops, which total around 350 and are distributed across multiple relays nationwide and in two other countries.

Thank you.

JasonWalker · November 7, 2024, 5:38pm

The Relay has to be configured to accept a Persistent Connection. So you could configure only the Azure and DMZ relay to accept the Persistent Connection (along with whatever is the parent of the Azure relay, if it’s not the DMZ relay itself).

If you don’t configure your internal relays with the _BESRelay_PersistentConnection_Enabled value, your internal clients will check when they register whether Persistent Connections are allowed on the relay, but the internal relays’ response will prevent the internal clients from establishing persistent connections.

adeilson · November 7, 2024, 5:44pm

Sounds good!

Also, with persistent connection enabled (yes, my relay, server, and clients are all on version 11.0.3), does this mean I can skip using command polling, or would combining them still be beneficial or even necessary?

JasonWalker · November 7, 2024, 5:48pm

I usually recommend every environment turn on Command Polling, just to help with the worst-case of “too many persistent connections for the Relay to handle” or “some firewall keeps forcing the TCP to timeout”.
I just tune it to one-two hours polling time.

That said if you haven’t used it before, you may not need to add it now.

But if you ever have just a few cases of Windows Firewall or your antivirus blocking the inbound UDP at the client, it’s well worth deploying Command Polling. At the scale you’re talking about the workload it adds to the relays is negligible.