As for the original problem…Server/Relay logging error when trying to gather a site that no longer exists.
I’ve done quite a bit of research on this in some large customers with long-running deployments and have seen this pop up a number of times.
Something to note is that the log message is really the only impact. When it’s just a few sites noted, and the sites don’t exist, it’s safe to ignore the message. I’d actually wish we could suppress logging the message or at least make it a “softer” message because it’s not usually indicative of a real issue and usually no action needs to be taken on it.
But if it’s a huge number of sites, or the messages are causing grief / making it harder to identify real problems, there are some things we can do.
There are usually two factors to consider -
-
There may still be clients attempting to gather the sites. This is actually a more rare case, but we can have clients that never unsubscribed from a site that was removed, and keep asking their Relay to gather the site for them. The “documented” way to fix this is to remove the __BESData folder from the client and allow the client to reset to a clean state. That’s fine if it’s a small number of clients, and you’re ok with the client resetting (and potentially re-running older actions that are still open, and if you can identify the clients to begin with)
-
There may be no clients still trying to gather the sites, but Relays try to refresh the site versions at every startup, for any site that any client has ever tried to gather from the Relay. Deleted sites never get cleaned up from the Relay by default. The documented way to fix this is via a Gather State Reset on the Relay. This needs to be done in a specific sequence, from the lowest-level Relays first up to the top-level Relays, because a child relay requesting the site would put it right back into the parent Relay’s gather list.
There are some undocumented settings that you could work with Support on using to try to clean up the site gathers, short of performing full gather resets, but because there are some risks involved they’re not documented publicly.