Cleaning out old sites in Bigfix Relay

RichardKav · March 22, 2020, 8:25pm

Hey Folks,

I see a lot of old posts about this topic

as well as some official documentation
https://www.ibm.com/support/pages/node/242623

I’m wondering if someone can help clarify this part in particular
"To resolve the problem, you will need to reset the relays in your deployment. It is required that you do this in a specific order. You first reset the bottom level relays (the relays closest to the clients). "

In my deployment we have multiple top-level-relays with customers standing up their own local relays as well when needed. I can follow the above procedure for the levels of relays I control but I can’t co-ordinate this action will all my customers relays. Would this process still help? Or would the lower relays simply propagate the stale data back up the chain?

Thanks!

JasonWalker · March 22, 2020, 9:24pm

If any of the lower-level relays are attempting to gather the stale sites, then yes that stale request will be re-propagated back up the relay chain.

What occurs here is that the relay keeps a list of every site a client (or child relay) has requested, so it can track the version of the local copy of the site that it has cached. When a site is removed, clients unsubscribe and should no longer be requesting that site; but the relay still tries to query upstream to check for a new version of the stale site, at relay startup and possibly periodically after that.

Once a child relay asks for a new version of the site, the parent relay also adds the site back to the list of sites that it will check in the future.

Unless you have dozens or hundreds of these stale sites, it’s not really a problem but does add a lot of messages to the relay log which can make it more difficult to tell if there’s really a problem.

If you are letting subordinate sites deploy their own relays, you might share the cleanup procedure with them and ask them to go through it once a year or so. Then you can do the same on your higher-level relays every few months, eventually it’ll all get cleaned out.

RichardKav · March 24, 2020, 1:57pm

Cheers Jason, that’s very useful to know.

Meydey · March 24, 2020, 6:04pm

This is a problem that needs a better, automated solution via fixlet or otherwise. I can access all my upper level relays fine, but RDP is not an option to some of my lower level/pcI vlan relays. The whole reason for a relay in those vlans is for a single collector/egress through any firewalls and back up the relay chain. They are by nature walled off from easy access. Manually RDP/consoling to those relays is not an easy or efficient task.
I am sitting around 60 relays now, and having to touch them directly for a cleanup is not optimal.

TimRice · March 24, 2020, 7:48pm

There is an Enhancement Request pending, please go vote for the request and see if someone at BigFix can get it done in the near future.

https://bigfix-ideas.hcltechsw.com/ideas/BFLCM-I-48

Meydey · March 24, 2020, 8:05pm

Done. Need at least 20 characters to reply.

RichardKav · March 25, 2020, 11:49am

Anyone have any insight on the differences between

and
https://www.ibm.com/support/pages/clear-out-obsolete-and-problematic-site-references-which-cause-too-many-sockets-remain-open-timewait-state-relays

The former recommends deleting the __BESData folder and all of its contents, and then has a slightly different order for for restarting the client / relay services afterwards/

LouC · April 28, 2020, 7:53pm

These links all appear to be broken now. Would someone mind posting the instructions if they happen to have them handy? I’m seeing this same error on one of my relays. What exactly is the correct process to remove devices from a site prior to deleting it? And what is the cleanup procedure?

Thank you!

bradsexton81 · January 25, 2023, 4:43pm

Here is the new link for the gather site reset

https://support.hcltechsw.com/csm?id=kb_article&sysparm_article=KB0023994