DSA replication monitoring

Is there a good way to monitor that DSA replication is occurring successfully? The Filldb.log only seems to show errors and not anything like ‘replication completed’. Also, the REST “…/api/repliation/servers” seem to do the same, where it doesn’t record the goods, just the bads.

Any suggestions?

1 Like

If you’re a Master Operator, in the BES Console you can select the “BigFix Management” domain, and use the “Deployment Health Checks” Dashboard. In particular the Actionsite Versions field should be the same (or at least close) across all of the DSA servers.

I’d expect there’s a way to pull this with Session Relevance so we can report it from a BES Web Reports server (and send alerts when needed), but I haven’t explored that yet.

Yeah, i need to be able to monitor it from a monitoring tool, so some report access by HTTP is preferred. Like you said, maybe there is session relevance that pulls back the current actionsite version and then I could compare that value against all root servers in my deployment?

Hmm… pulling
(names of it, versions of it) of bes sites

seems to show only the External Sites…I don’t get a result for the actionsite, custom sites, mailbox sites, or operator sites. Haven’t found a way to retrieve the actionsite version yet. If I check for only ‘names of it’, I get ActionSite added to the list, but no custom sites, mailbox sites, or operator sites. So ActionSite is present but doesn’t have a version, and the other sites aren’t retrieved by ‘bes sites’. Changing to ‘all bes sites’ adds the Custom Sites and my operator site, but still no mailbox sites (and still no version for ActionSite)

Let me know if you find something, I could use some DSA monitoring as well…

Within the /api/replication/servers REST API resource, there is a field for ‘LastReplication’ which is a date/time field, along with an ‘IsConnected’ field and ‘LastError’ for any errors should they occur. If ‘IsConnected’=1 for the given link, and there is no value within ‘LastError’, replication for that link can be considered successful.

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Endpoint%20Manager/page/RESTAPI%20Replication

I have seen other customers compare database sizes between primary and secondary.

Maybe querying the database and doing row counts would be a good comparison.

As well as querying the uploads and uploadmanagerdata directory on the server and doing directory and file counts…

However, this would not be a completely accurate way to determine this because DSA does not replicate everything; such as deleted objects, a good reason why computer remover and audit trail cleaner tools should be run on all replicas.

Checking database size is a very rough view for completion. Tracking changes to size in secondary databases over periods of time would also help to indicate there is movement and replication happening, but not a dead on “Completed” status.

1 Like

If I did use the /api/replication/servers method, would I need to monitor that url for both the Primary AND DSA servers? If replication failed on the DSA, would the replication errors get passed back to the Primary root server? Essentially, doesn’t successful recording of the errors require replication to be working?

Yes, you should be monitoring both the Primary and DSA replica using this method for exactly the reason you state. Additionally, I recommend monitoring both replication links on each Server to verify replication is occurring as expected in both directions (assuming standard bi-directional replication).

Thanks, good info. But I’m curious how the Deployment Health Checks dashboard is retrieving the ActionSite versions from each DSA server and Relay? I think that’s a good statistic to monitor, it seems each time I have DSA replication problems one of the DSA servers will stop seeing the other servers’ ActionSite versions incrementing.

yes, i would like to see how that works too. A simple comparison of “current actionsite version” against all DSA root servers seems a little easier than parsing the /api/replication/servers xml results.

I looked and wasn’t able to find a session relevance inspector that returns the version of the actionsite though.

The Deployment Health Check dashboard appears to be getting the “ActionSite Version” via the DeploymentHealth.js, which is just reading it from the results of the property “Actionsite Version” in the “BES Health Checks” analysis that is in the BES Support site.

While monitoring the actionsite versions of the Root Servers is worthwhile in itself, I’d suggest that it’s not necessarily a good indicator of DSA replication status. For instance, because the actionsite is not constantly propagated/updated, you could have matching actionsite versions, but non-functional replication.

in the /api/replication/servers xml output, there are 2 <ReplicationLink…> nodes each with a node. And there are 2 <ReplicationLink…> nodes for each <ReplicationServer…>

So in our environment, we have a Primary root server and 1 DSA secondary server. For those 2 servers, do I have to monitor all 4 of the nodes?

You don’t necessarily need to monitor all 4 nodes, no, but they can sometimes provide useful context.

What might help here is to monitor more specific resources (i.e. the replication links). For instance, if you want to validate the replication link where master is pulling from secondary, you could leverage the following resource (assuming serverid 0 = master, and serverid 1 = secondary/replica):

https://<master server>:52311/api/replication/server/1/link/0

And if you want to validate the replication link the other way around (secondary pulling from master):

https://<secondary server>:52311/api/replication/server/0/link1

1 Like

yes, that seems easier. thanks