Relay Behind NAT with no forwarding

TheChad · March 28, 2016, 9:08pm

I have a situation where a network is behind NAT and there is no way to set up port forwarding for the relay at that site (firewall has dynamic IP, etc). Of course command polling does nothing since the local client on the relay points to itself and will return with “commands to process: 0” until the relay gets the new content from it’s parent.

I can see this behavior on the relay’s client as well as other clients on that network. If I restart the relay, it checks upstream, gets new content and notifies it’s clients (GatherHashMV). The clients then get the content from the relay immediately.

I thought that setting _BESRelay_GatherMirror_UpstreamCheckPeriodMinutes to something like 3 minutes (this is a lab environment), would make the relay do the same as when it’s restarted… check upstream and get new content. But it seems to have no affect at all.

I also tried setting _BESRelay_GatherMirror_ResolveVersionConflicts to 1… No affect.

How in the world can you get a relay on an isolated site behind NAT to go upstream periodically and get new content?

StefanoBelluomini · March 29, 2016, 6:24am

It shouldn’t matter being behind a dynamic IP Address - IEM updates the source address attribute of all agents everytime they send a report. So when you push an update back to that machine, it will know the new Dynamic IP Address.

Refer to the note in the following article: http://www-01.ibm.com/support/docview.wss?uid=swg21505929

TheChad · March 29, 2016, 12:34pm

Right… But… A) the relay is behind 2 NATs and B) Neither firewall can be configured to forward traffic.

The article you cite is for a DMZ in a corporate network, which is not the case here.

Surely there has to be some way to tell a relay to act like a client with command polling enabled. That sure seems like what BESRelayGatherMirror_UpstreamCheckPeriodMinutes would intend to do. It just doesn’t.

TimRice · March 29, 2016, 1:33pm

According to the Client Settings documentation , _BESRelay_GatherMirror_UpstreamCheckPeriodMinutes should do what you are looking for …

This setting controls the minimum amount of time a relay (or a root server in DSA deployments) will wait between checking for new versions of sites. In a network with full connectivity, this polling behavior will be unimportant, because relays will always receive notifications when new sites become available. But when notifications get missed, this polling behavior allows a relay to “catch up”. The downside to polling too frequently is that it can drive unnecessary load into the parent. The polling will only be done in response to a client request, so if none of a relay’s children are asking for a site, it won’t go upstream to check for the site, even if its polling period has expired. Note that a “request” in this case means any query about the status of the site – so a command polling request from a Client can trigger this upstream check even though it’s not directly requesting any sites.This setting does not directly control a “relay gather interval”, as there is no such thing. The Relay only initiates gather requests in response to notifications received from its parent or gather requests received from one of its children. However, if a Relay has many children, gather requests will come in frequently, and the amount of time between upstream checks will end up being very close to the minimum amount of time specified by this setting.

Are the BESClients behind the NAT configured for Command Polling themselves?? The above documentation seems to indicate that if Clients never ask for a site, it won’t matter if the Relay is configured to poll for it.

AlanM · March 29, 2016, 4:31pm

Note that clients do have default periods where they will request a site (determined by the masthead of the external site or the setting in the deployment masthead for custom/actionsites) and do perform a “command poll” every time they re-register.

TheChad · March 29, 2016, 6:33pm

Yes. They are. I see the clients poll the relay and I see the relay (debug logging on) respond with:
/cgi-bin/bfenterprise/clientregister.exe (3224) - Fetching the versions of all known sites in response to command poll from client 2945518
And the client logs commands to process: 0
But there are definitely actions pending for the clients that they are not getting.

Note: The relay does seem to be checking upstream now. But not at the interval I specified. Takes about 15 minutes.

jgstew · July 19, 2016, 9:07pm

I thought relay to relay communication was over TCP. If a child relay opens a persistent TCP connection with the parent, then wouldn’t that mitigate this condition?

AlanM · July 19, 2016, 10:21pm

That can address the problem yes. It doesn’t currently use a persistent connection though, nor does its parent know that it should communicate downstream with that connection.

Wolf359 · August 29, 2018, 10:47am

Was there ever a resolution to this issue?

I have also configured my relay’s client settings with:
_BESClient_Comm_CommandPollEnable
_BESClient_Comm_CommandPollIntervalSeconds
_BESRelay_GatherMirror_UpstreamCheckPeriodMinutes

But this doesn’t make any difference. The client does poll its local relay service (on the same instance) as expected. But the Relay is not performing a gather to its parent Relay as I would have hoped. The only way I can force a refresh is to manually restart the besrelay service. This isn’t really a viable solution, and the default cycles are too long to have to wait.

jgstew · August 29, 2018, 8:51pm

It seems like _BESRelay_GatherMirror_UpstreamCheckPeriodMinutes is the setting that would resolve the issue of a relay behind a NAT or similar. The default value seems to be 6 hours, but if you set that to something more aggressive, then that should be the about the max time to wait for new content to bridge the gap.

It would be better if port 52311 could be forwarded on the NAT to the child relay behind it.

bpastore · September 11, 2018, 5:48pm

another setting that has impact in the relay behind NAT scenario is the parameter:

_BESGather_Mirror_SiteVersionPollingPeriod

It’s value is, by default 30 minutes. This is the time interval between two consecutive checks of the site version on the parent relay. Let me clarify this concept.

Every time a client asks for a new site to the relay, the relay should in theory check with parent if a new site version exists. This setting controls how often the relay performs this check: if it is set to 30 minutes, the child relay waits 30 minutes before checking with the parent relay if a new site version is available, even if clients are continuously pinging it to check for new site versions.

Usually this setting is not very relevant: if the parent gathers a new site version, it notifies the child relays, that immediately gather the new site version (no need to wait any time). But because in a NAT environment the notifications are blocked, this parameter becomes relevant and prevents a relay from checking more often than every “_BESGather_Mirror_SiteVersionPollingPeriod” minutes if the parent has a site update.

Reducing the value of this setting in the child relay configuration allows to checking more frequently if a site update is available on the parent relay (again in case of pings not reaching the child relay). But decreasing it too much can have negative effects on the performances of the parent relay, which gets pinged too often.