Gather status keeps failing

I have an issue with the Gather Status and Actionsite Versions failing in my bigfix health checks. I am inheriting a bigfix environment with little documentation as to what has been going on. I see there are many issues that I have to overcome but this is a topic for another post. I keep clearing out these errors just to have them come back the next day. What bothers me the most is that one of the errors in the Gather Status section is in our primary relay. I could use some advice as to what could be causing these errors to keep coming back. I should also point out that it is not always the same machines.

How are you clearing out the errors? If the errors reflect sites that no longer exist, and you see the most errors on your top-level relays, then it is somewhat expected behavior since lower-level relays will just re-request the sites on a later interval. You would need to run a cleanup process on each relay starting from the bottom up in order to fully clear those errors.

If the errors are for active sites, then they are more concerning, and you should investigate where in the chain the problem is introduced (which tier of relay sees it first). The action site version mismatches fall into this category. A version that is one off, could just be a timing issue with when the relay reports it’s updated version info, but the LONMSG relay that is several versions behind needs to be fixed. You can try the quick a dirty approach of uninstalling/deleting folders and reinstalling the relay. Or there is a gather reset process you can attempt with the help of IBM support or a knowledge doc we have.

I have run the Gather Reset option several times and the issue comes back, and not always the same sites. One site that I think is causing the issue is our main bigfix relay. It fails the gather status and the gather reset doesn’t work which means uninstalling deleting folders and reinstalling. Hopefully the others will fall into place.

I think you are using the word ‘site’ to represent a physical location of a relay, but I am referring to our content sites that are gathered by the relays and are represented in that dashboard when they fail to gather. The relay with 33 errors means that it failed to gather 33 different content sites. You would have to look at the gather status page for each relay to understand which sites are failing (old vs active) and if there is any consistency across relays. These are relatively advanced troubleshooting steps, so if you don’t have much experience with BigFix, I’d recommend opening a PMR to make sure you’re interpreting the errors and using the reset process correctly.

1 Like