Relay using loopback address

I hope I put this in the correct category.

We have a fairly complex, multi-tenant setup. More than 130 customers, each having three relay affiliation lists with the first list being the relays in the customer environment, the second list being my companies internal relays (Accessible by customers through a VPN tunnel to our services) and the third list is internet facing relays.

Some of our customers have remote locations with very slow links, 3MB or 5MB circuits. These sites sometimes go down for a day or so. When this happens, the on site Relay’s client will not be able to access any of the parent relays and it burns through the list and then cycles back to our main server, which they don’t have access to, even through the VPN tunnel. So when the connection is reestablished, they only try the main server and the agent points back to itself using the loopback connection.

What are we doing wrong? In my opinion, the client in the relay should never use the loopback. If the client can’t connect, cycling back to itself will never work.

What I have to do to fix this is disable the relay service. Make a host file entry for our main server URL to point to an accessible relay, restart the client and wait for it to call in, then remove the host file entry and restart the relay. A lot of work and time consuming.

Is this normal behavior?

Normal behaviour - the client on the relay always uses itself as the relay and does this by using the loopback address.

While the relay cannot access the parent relays it will attempt to buffer all of the client reports, ready to pass on when comms are restored. The default size of the buffer is quite small, but can be tuned (with care) if the comms outages are too long to prevent the buffer filling.