Silent Relay Failure

(imported topic written by SystemAdmin)

I’m consistently seeing a relay anomoly on 8.2.1093 that I haven’t seen in previous versions. The relay service is running and process is in memory, but it isn’t doing anything. There aren’t any errors logged either in the relay log or windows app log. However agents and upstream relays apparently detect something is wrong and literally stop talking to it. In this state, doing netstat -an shows that the relay is listening on 52311, but there are no conversations with other machines. Cycling the relay, then doing netstat -an shows dozens, maybe hundreds, of connections to agents and upstream relays.

The clue I watch for is when a large number of agents start reporting directly to the central server instead of their local or central relays. It isn’t uncommon to find several relays - frequently central relays - in this situation simultaneously.

Any ideas? Has anyone else seen this behavior?

(imported comment written by SystemAdmin)

Using the throttling setting _BESRelay_HTTPServer_MaxConnections on 8.2.x relays appears to be the culprit. Throttle max connections and the relay silently goes out to lunch over a period of hours or days depending on demand. The exe remains in memory, the service is running, but nothing can connect to it. Using netstat confirms that no connections are working. This setting worked fine in previous relay versions.

I’m experimenting with some other throttling settings to see if they work and if I can achieve the same basic result.

(imported comment written by RubenB.)

Has anyone found a fix for this? I just upgraded to version 8.1.634.0 and 14 of my relays just disappear after a few hours? When i connect to the machine, I see both services (Client and Relay)running. If I stop the relay service, the machine comes back online.

Thanks.

(imported comment written by SystemAdmin)

Are you using the throttling setting _BESRelay_HTTPServer_MaxConnections? If so, set it to a large number, then cycle the relay. Or avoid using that setting altogether in favor of other throttling options. I suspect when the max connections threshold has been reached, the relay doesn’t know what to do with the additional incoming connections that end up effectively DDOSing the relay. The relay apparently then abbends, but remains in memory, which is very misleading from a trouble-shooting perspective. Use netstat -an to find out if the relay is actively talking to others. If it is listening on 52311, but isn’t communicating with any agents or relays, then it has basically abbended. You should normally see a large list of connections in various states to 52311.

(imported comment written by SystemAdmin)

We are on BixFig version 8.0.584.0 and have the same problem. The documentation on “_BESRelay_HTTPServer_MaxConnections” is very limited and doesn’t have list criteria to determine what might be a better setting than the default 1024.

Any suggestions.

Greg Messemer