Monitoring the bes client?

cstoneba · August 8, 2016, 12:44pm

From the endpoint server, is there a way to monitor the BES Client to ensure it is reporting in? Not just that the service is running, but that it is actually posting results?

TimRice · August 8, 2016, 1:34pm

Short Answer : No. The server really only knows when a client HAS reported. It doesn’t really care if a client HAS NOT reported.

In the Console, you can get an idea that an Endpoint Agent isn’t reporting since, when an endpoint has not reported for a while (configurable time), it’s object in your Console will switch to a light grey as opposed to the normal Black. This indicates that the client is “off-line”, or hasn’t reported in a while. You can sort your Computer view by the Last Report Time to see which one have the oldest report times.

From Web Reports you can run a report to display computers that have a LAST REPORT TIME beyond a threshold of your choosing.

Other than that, I’m not sure what else you would like to see.

cstoneba · August 8, 2016, 1:49pm

Having a local timestamp on the client of the last time it successfully posting results upstream would give us another way to monitor the client health, other than running a web report of the devices that haven’t reported in with the last x hours.

But i think you’re right. The only way for now is to run a report against WebReports for the endpoints that haven’t reported in with the last x hours.

AlanM · August 8, 2016, 7:55pm

There is an inspector for this

See https://developer.bigfix.com/relevance/reference/client.html#last-report-time-of-client-time

cstoneba · August 8, 2016, 8:06pm

The inspector might know that it posted upstream, but it wouldn’t know if it’s Relay’s Relay is down or not, right? Meaning only asking the Console/WR would be 100%?

q: last report time of client
A: Mon, 08 Aug 2016 15:05:22 -0500
T: 0.022 ms

AlanM · August 8, 2016, 8:20pm

Correct, this is the time that the client posted to its relay. If the relay is not connected then it will not make it to the server.

cstoneba · September 20, 2016, 3:13pm

Ok, so then querying WebReports sounds like the only good way to accurately know the LastReportTime.

Now to complete the full loop, in order to know that the client can successfully run an action, that would rely on LastUDPPing, which I have a property for, I would check that property in WebReports. However, that property only gets updated when the client receives a UDP 52311 packet via a site propagation, which has no set interval. What’s the workaround there? Have some scheduled process propagate the Master Action site every few hours?

JasonWalker · September 20, 2016, 5:02pm

LastUDPPing is not actually required to run an action; that’s only used to notify the client that a new version of the site is available and the client will accelerate its gather and execute the new action.

If UDP is blocked to the client, it will still gather and execute actions periodically (by default, every 24 hours).

cstoneba · September 20, 2016, 5:08pm

True, but having UDP 52311 blocked prevents on demand “run this now” action execution, which is what I’m trying to monitor the ability of.

cstoneba · September 28, 2016, 4:18pm

If I could get the BES Client API to report back on Last Report Time, would that data be retrieved from the local BES Client (and therefor not that reliable for Last Report Time), or would it actually pull from the Root server?

cstoneba · October 3, 2016, 6:44pm

anyone? Where does the data in the BES Client API come from?

cmcannady · October 3, 2016, 8:00pm

@cstoneba,

The data pulled from the BES REST API is same that you’d see in the BES console. If the client hasn’t reported, then you won’t see updated details for that client regardless of which central interface you choose to confirm the data.

It may be easier to pull the BES REST API for endpoints that haven’t reported for longer than some period of time. For example, query build off the query below and use the last report time older than X as the conditional.

https://yourbesserver.yourdomain.com:52311/api/query?relevance=(IDs of it, names of it, last report time of it) of bes computers whose (name of it as string as lowercase = “somecomputername” as lowercase)

You’ll need a custom task or a 3rd party monitoring solution to parse the returned XML data and act accordingly.

Hope this helps

@cmcannady

AlanM · October 4, 2016, 12:54am

Are you meaning a CLIENT api or the REST api?

The Client API is known as the Client Compliance API and is used for programs on the endpoint to get some information from the local client.

The REST api is on the server and gets its data from the root server (which gets it from the endpoints)

steve · October 4, 2016, 5:33am

I think you’re mixing opposing monitoring efforts here. Monitoring the last report time and last ping received, as known by the client, makes sense to monitor from the endpoint. And this is the view you will get from the Client Compliance API, as Alan mentioned.

But it doesn’t make sense to me to also try to monitor other endpoints status from a particular endpoint (e.g. a relay higher upstream than it’s parent). This monitoring should already be pretty evident on the server by that relay itself not reporting these same properties and/or it’s last report time for an extended period.

In terms of the effectiveness of measuring the last UDP ping received, what problem are you actually trying to detect? You can use a fixlet to detect if the local firewall is blocking UDP, so I would assume that you are worried about an endpoint moving to a new part of the network that is blocking UDP at a network level (router or HW firewall), or someone changing existing network config. For this case, showing a UDP command received in the last day or 2 would seem sufficient unless your deployment has extremely low usage or is air-gapped.

cstoneba · October 4, 2016, 1:56pm

Thanks for the responses. What I’m trying to do is have a local monitoring agent that is installed on the local endpoint, to be able to monitor the that the BES client is functioning… The 2 properties that I know of to best track that would be “Last Report Time” and “LastUDPPing”. However, those managed property values are not stamped anywhere local to the BES client. So I was wondering if the BES Client API (aka Client Compliance API) would be a good way for the properties to be retrieved by the monitoring agent.

Alan, you said that the Client Compliance API gets the data from the local BES client. If we take Last Report Time for example, then that value would not be trustworthy at the local Client level because the Client is unaware of an upstream Relay’s Relay that is down. Correct?

One alternative option I might try is to just have an action pushed to all computers that sets a local timestamp of “now” within the BES Client settings. The action would be pushed every x hours (not re-occuring the same action, but a new action every x hours) Then the monitoring agent could look at the client setting timstamp to see if it is recent. If the action runs, then it proves that both UDP 52311 from Relay to Client AND TCP from Client to Relay are open.

strawgate · October 4, 2016, 2:46pm

I’m wondering if there isn’t a side effect or a couple of side effects of a properly functioning client that you could monitor:

maximum of last gather times of sites

Which would show the last time a site was successfully gathered from the upstream server?

steve · October 4, 2016, 3:05pm

Yes, you use the Client Compliance API to evaluate client relevance on the endpoint for use by an external application. So the relevance last report time of client and last command time of client could be checked by a monitoring utility to see if the agent is still functioning.

I disagree with the statement that these values “would not be trustworthy.” These values would be coming from the source (the agent), so they are they are definitely trustworthy, and would correctly indicate that the agent is functioning. Whether that value makes it to the server because an upstream relay is down has nothing to do with whether the agent is functioning, and would not be something that you could resolve with your monitoring agent. The monitoring of upstream relay health should be done on the server, or on that relay itself.

I would strongly discourage pushing a new action to all computers every X hours to validate agent responsiveness. This wouldn’t give you any more information about why a client doesn’t respond to the action then you get already with last report time on the server (still couldn’t tell the difference between a client being offline, not functioning, or a relay not functioning without doing the other things mentioned), and would create a lot of unnecessary overhead in the environment.

It seems that you have some agents/relays that are not functioning reliably driving this activity. Maybe it would be more effective use of time to investigate those specific endpoints, so you can feel confident in the data that already flows into BigFix.

cstoneba · October 4, 2016, 3:24pm

Hi Steve. My Client monitoring goal is to ensure that both the Client is active, but is also reporting up through the BES infrastructure (so that the client can actually be used within an action when needed). So for that reason, I think checking the “last report time of client” by the Client Compliance API level is not what I want.

Similarly, checking “last command time of client” from the Client Compliance API would only confirm the last time that UDP 52311 was received by the client. Without a constant site propagate (or action) every X hours, that value will become stale and won’t be a good indication if UDP 52311 is getting through or not.

Can you elaborate on why would discourage pushing a new action to all computers every X hours? If the monitoring goal is to ensure that both TCP 52311 AND UDP 52311 are getting through, then pushing an action (that complete successfully) every X hours would prove that. True that it would take more overhead then just reading the 2 properties mentioned above, but for me, those are accurate enough.

cmcannady · October 4, 2016, 7:59pm

My original response to this thread was focusing specifically on querying the BES REST API to get a listing of endpoints that hadn’t reported in some period of time. Clearly there’s issues with this approach, specifically endpoint attrition and validity.

In my experience, it would be better to have a local monitoring tool to alert and/or automate the start/restart of the BESClient in an error or unknown state.

steve · October 4, 2016, 10:16pm

I guess I’m not understanding why looking at the last report time in the console doesn’t tell you that the client is active or not. If you see a current last report time in the console, then you know the client is up and actually functioning to the point that it is reporting, and you know that the reports are flowing properly through the relay infrastructure.

So the only remaining possible issue is if UDP traffic is not flowing to the endpoint. Using a fixlet to check for UDP being blocked by firewall and reporting a property with the last command time of client should be sufficient to tell you whether UDP is functioning. Unless there are transient network issues occurring that prevent UDP transmissions randomly that you’re also trying to detect. If that’s the case, though, would anything be frequent enough to actually detect such an issue?

The only reason I see to try and monitor things on the endpoint directly, is if you intend to take corrective action based on what is detected. Is that the case here? Are you planning to restart the agent and/or update firewall rules based on what the monitor finds?

If you deploy an action every X hours, it is good confirmation for working clients, but those would also have accurate “last report time” values. It wouldn’t give you any real insight into what is wrong with the clients that don’t respond, though. Are they shutdown, off network, service stopped, service hung, UDP not working, relay busy/down, etc? You would still need to do additional investigation or run the monitor or just ignore them. So your generating all these additional actions, gathering traffic, action results, reporting traffic, and interrupting every client on this schedule just to re-confirm what the console displays; but not actually help fix the problem agents.

So all you know is which agents not to target for some later action because they won’t respond anyway. How is that valuable??