Solved! Endpoints are unsubscribing sites, deleting them, and subscribing again later, adding them back

I’m putting this under Server Automation because I am launching my jobs through SA, and I have a particular account that these are launched with. I am pushing to roughly 3000 endpoints each night with 30-40 different steps from SA. Every night, I am seeing around 20 or more “failures”, which are showing up as non-relevant. I don’t actually have any relevance on the fixlet itself, but when SA launches the task, it is looking for the opsite of the account that I am using to launch the automation plan with for relevance and not seeing it. This is very strange considering that I have not been changing on site settings/computer groups for some time now. I looked into this and I am seeing some endpoints unsubscribing itself from a handful of custom sites, then hours later, re-subscribing. I’m not sure what is causing this, but I am still investigating. Any help would be appreciated, thank you.

Hi @heagsta,

So am I right in thinking that you see something like this in the relevance for the steps in that plan?:

exists administrator whose (name of it as string = "__op_123") of client AND (<original-relevance>)

and it’s this that’s causing the non-relevance result?

So the automation plan engine dynamically updates the relevance of steps from plans that are executed by non-master operators. The reason for this is security - we do not want plans executed by non-master operators to perform work on computers that the non-master operator does not have admin rights to.

The non-master operator can only target steps at computers which they have admin rights over. The reason we need to do this is because the plan engine always creates step actions as a master operator (who will automatically have admin rights over all computers), so this extra check is necessary.

To get over this, check to make sure the non-master operator that initiated the plan has admin rights over all the computers that are returning a non-relevant result.

Cheers,
Paul.

So the problem I don’t believe is SA, but the underlying issue is that my endpoints are “randomly” unsubscribing to sites and opsites, please see an example below. This behavior occurs for multiple sites as well, and notice the dates are from December of last year?? I don’t recall ever setting a policy to add or remove sites ever.

At 12:58:38 -0800 - 
   ActionLogMessage: (action:50900) ending action
   ActionLogMessage: (action:205559) Action signature verified for Execution
   ActionLogMessage: (action:205559) starting action
At 12:58:39 -0800 - actionsite (http://IEMSERVER..com:52311/cgi-bin/bfgather.exe/actionsite)
   Command succeeded setting "__Group___AdminBy___op_165"="False" on "Wed, 16 Dec 2015 15:30:32 +0000" for client (action:205559)
At 12:58:39 -0800 - opsite165 (http://IEMSERVER..com:52311/cgi-bin/bfgather.exe/opsite165)
   Removing operator site (operator no longer valid)
At 12:58:39 -0800 - actionsite (http://IEMSERVER..com:52311/cgi-bin/bfgather.exe/actionsite)
   Command succeeded administrator delete "__op_165" on "Wed, 16 Dec 2015 15:30:32 +0000" (action:205559)
   Command succeeded (evaluated false) continue if { value of setting "__Group___AdminBy___op_165" of client = "True" } (action:205559

)

At 00:58:02 -0800 - 
   Adding operator site (__op_165)
At 00:58:02 -0800 - actionsite (http://IEMSERVER.com:52311/cgi-bin/bfgather.exe/actionsite)
   Command succeeded administrator add "__op_165" on "Wed, 16 Dec 2015 15:30:32 +0000" (action:205559)
   Not Relevant - Assign and Revoke Management Rights For __op_165 (fixlet:205559)

Hmmm… I see.

I think maybe you’ll want to push this into a separate category in that case, a lot of folks unfamiliar with SA may not read it if it appears in this section.

For my own part, I don’t really know what’s happening there… that’s not a behaviour I’ve seen before :confused:

Ok, moving it out of SA, thank you.

After reading up on this a bit more, seems like I may have an issue with hidden actions, not sure how long it has been going on exactly. Still waiting on support for any type of information on this.

1 Like

I am trying to figure out how to look at hidden actions, which I found a thread or two on, but the steps are not working. does anyone know how to show all actions, regardless if they are naturally hidden from the console, so that I can see manual group, site subscription, etc actions? I am seeing that a lot of these might be the result of the same action that appears to be from December of last year.

!!UPDATE!! The ShowAllActions settings did show the manual group actions, but nothing else, I am trying to locate the subscription actions and op site actions, deletions/removals, etc.

Still waiting for a response back from the PMR I put in. However, I think the source of this is that Client Roles Subscriptions are unsubscribing, and during the unsubscribe, all of the site content associated with that is being unsubscribed/deleted. Then, something causes it to become relevant again (this time it took about 12 hours and the Client role is subscribed to again and all of the content within that role, including sites, etc. is downloaded once again. Very strange, hopefully I hear back soon,

Are you running in a DSA server configuration? I had this subscribe / unsubscribe occur once during a DSA promotion, when the new secondary DSA was publishing site content before the operator management groups were replicated.

How are you assigning operator rights (Groups? Roles? LDAP-defined groups or roles?) If there’s a condition that can toggle, like the BES server failing to reach an LDAP account source, that could cause problems like this.

1 Like

This could also happen if you have something complicated in your site subscription criteria.

I do not have a DSA server. I will take a look at my site criteria, but I’m pretty sure we have dynamic groups, “is a member of”, for just about all of them. I will go through that and the group criteria, but I’m not sure how it would toggle, they are straight forward relevance statements. Thank you for the info, I will investigate further.

Interestingly, tracing this back some more, I found that the endpoints that are affected vs endpoints that are not affected come down to a single group membership. This group relevance is almost identical to the group relevance of the machines that are not affected by the issue. The unaffected group has a line that states "DNS name contains “.car.mart.com”, which works just fine and the problematic group states "DNS name contains “car”. I cannot for the life of me see why this would make a difference, but I will be changing it shortly and see if this fixes the issue.

I did confirm that in the state of unsubscription, the DNS property has not changed from that of a non-affected machine. I will report back once I get some results. Thanks for the help.

The obvious difference in those is if my dns name was “carriage.something.com” then the ‘contains “car”’ would trigger but not the “.car.mart.com”

Got to be careful with the very short text matches

Ok, I have now located the issue and should have remembered this but forgot. When we use the Active Directory Path property on our servers, it returns randomly and not consistently. Therefore, when I used this as part of the selection criteria for a group to subscribe to sites in, when it would not return, the sites would unsubscribe. When the Active directory path property returned for the given endpoint, it would re-subscribe. Using the active directory path, this issue exists on many machines, if not all machines.

The Fix for us is to use the DNS property and use contains ".car.mart.com"

Sad to say that I had used this in the past when I had bleed over into other groups when using the Active directory path, but I overlooked that property in this issue. I won’t forget to look next time. Hopefully this helps someone else out that is having the same issue.

2 Likes