Session Relevance Query Need for Relay Health

(imported topic written by SystemAdmin)

We recently had an issue where a route was deleted to a relay server and we didn’t know about it until there were 0 clients reporting into that relay (another relay health check we have).

What I would like to do is put together session relevance code that will check the relays and identify which relays have not had a client report to them in over 24 hours. The report would list those problem relays and the last time a client reported to it.

Can anyone assist?

(imported comment written by Lee Wei)

Here is a possible solution.

This statement gives you all the Relays in your environment:

unique values of values of results of bes property "Relay"

This statement gives you all the Relays in your environment with at least a computer who has reported in the last day:

unique values of values of results from (bes computers whose (now - last report time of it < 1 * day ) ) of (bes property "Relay")

Using the set operator, we can subtract #1 - #2 to give us the Relays with no computers reporting in the last day.

elements of (
    set of unique values of values of results of bes property "Relay" - 
    set of unique values of values of results from (bes computers whose (now - last report time of it < 1 * day ) ) of (bes property "Relay")
)

Lee Wei

(imported comment written by SystemAdmin)

Thank you Lee. That second statement was key. I was trying to attack this in a different way.

(imported comment written by SystemAdmin)

I modified the overall query for 3 reasons. I wanted to post this so others can use this as well.

  1. We use Name Override, with IP addresses so I wanted to get all name references removed. - so this is one reason why I changed first statement.

  2. For the first statement, we have clients that are offline, reporting to relays that do not exist anymore, so these were getting stuck in the output. This is the other reason I changed the first statement

  3. I wanted to eliminate relays that were not reporting in, because this would cause the clients to stop reporting. We have another report that will alert us when a relay is not checking in.