Gather State Reset Process "incomplete"?

IanDM · December 17, 2021, 10:31pm

I was seeing a number of departed operators showing up in gather errors on clients and relays so I performed a master server gather state reset (KB0023994) and on relays (KB0079078) as well.

Most of the entries are gone, but one for opsite142 was not cleaned up and one for opsite199 that was not showing up previously is now showing up as needing a cleanup.

Should a single gather state reset be “enough” or could there be multiple executions required to resolve all issues?

Before Cleanup
Failed:

Id: 17	Date: Fri, 17 Dec 2021 15:13:31 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite141
Error Message: 17: 17NotASignedMessage

Id: 20	Date: Fri, 17 Dec 2021 15:13:31 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite142
Error Message: 20: 17NotASignedMessage

Id: 74	Date: Fri, 17 Dec 2021 15:13:31 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite150
Error Message: 74: 17NotASignedMessage

Id: 75	Date: Fri, 17 Dec 2021 15:13:31 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite153
Error Message: 75: 17NotASignedMessage

Id: 69	Date: Fri, 17 Dec 2021 15:13:31 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite183
Error Message: 69: 17NotASignedMessage

Id: 70	Date: Fri, 17 Dec 2021 15:13:31 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite197
Error Message: 70: 17NotASignedMessage

Id: 71	Date: Fri, 17 Dec 2021 15:13:31 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite202
Error Message: 71: 17NotASignedMessage

Id: 80	Date: Fri, 17 Dec 2021 15:13:31 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite214
Error Message: 80: 17NotASignedMessage

After cleanup
Failed:

Id: 32	Date: Fri, 17 Dec 2021 22:21:02 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite142
Error Message: 32: 17NotASignedMessage


Id: 46	Date: Fri, 17 Dec 2021 22:21:02 +0000
Url: http://bigfixprod:52311/cgi-bin/bfgather.exe/opsite199
Error Message: 46: 17NotASignedMessage

JasonWalker · December 20, 2021, 5:56am

There are a lot of nuances to the gather reset process and some advanced settings that can be tuned…I recommend you should open a Support Incident and the team can walk you through it.

One consideration is that once a Relay knows about a site’s existence, by default it will try to gather that site at startup, even if no client is still requesting that particular site. So a Leaf-level relay will try to gather from it’s parent - and now the Parent Relay will try to gather at its future restarts.
So a Gather State Reset needs to start at the bottom-level leaf relays, and then work upward from there to reset the parent relays, and the root server last of all.

gpoliafico · December 20, 2021, 7:12am

The two gather reset procedures are complete as are, it is not useful repeat them at least after a short time each other, need just be sure to complete the procedure once started and pay attention to not skip any step.

But I see two misunderstanding here.

The first is that the purpose of these procedures is not to cleanup these messages … while them are, as from the name, to reset the ‘status’ of the server or the relay … while cannot avoid that the previous ‘status’ return back if it is in place for something not related to the server or the relay … as for example server side because a problem with the gathering of the sites or, as probably occurs in this case relay side, because at least one agent have not recieved the notification that an operator have been deleted.

The second misundirstanding is about these messages, are not errors, don’t need to be removed, do not cause any problem and are not a symphtom of an issue especially if they are few as in this case …