Automatic Group for <not reported>

dfinjr · November 9, 2015, 2:57pm

Hello,
I am working toward correcting a problem that crept up with our environment during installations. We had rolled certain information into our installer that ultimately lead to some problems when trying to avoid some network load. We were seeing a lot of under things as simple as BES Client version and the full operating system name and service pack also coming up as . The fix ultimately came to be creating a job the deleted the _BESData folder on the system and then ultimately it re-downloading the needed information for that client. My problem lies in collecting and automatically fixing those units. I need help finding a way to automatically group my results of retrieved properties as mentioned above and then apply a never ending fixlet to that automatic group so that as broken systems show up, they also get the repair. Any guidance is appreciated.

Thank You,
-David

jmaple · November 9, 2015, 3:10pm

I’ve been seeing something similar in our environment for a few months now. I’ve created a task that just clears out the “__results” file and then restarts the client and then it reports normally but I have to run it manually. I’m not sure why it does it but since the “not reported” is happening on the console side setting up relevance to put these clients in a specific group may not be possible since that evaluation for which groups to join happens on the client.

jgstew · November 9, 2015, 4:00pm

Have you tried sending a refresh to these clients to see if that helps?

It really seems like that is all that is needed, and if so, I do have a way to automate sending a refresh to clients.

Here are other posts related to Send Refresh / Force Refresh:

dfinjr · November 9, 2015, 4:20pm

Sure have, doesn’t seem to have an affect, however I am thinking that it is due to the items we tried to roll into the installer. After we clear up the _BESData folder, it checks back in and the items correct and report the way they are supposed to. I am about to try what jmaple suggested about the “__results” file and at least see if that corrects the issue with a smaller amount of traffic. I am not sure why the “send refresh” doesn’t do the trick…

jgstew · November 9, 2015, 4:52pm

I believe the Send Refresh only works if the client gets UDP notifications.

When you send refresh, do you see in the client logs that a refresh command was received?

It should also eventually have in the log that it submitted a full report successfully. Once that happens, then that means that it’s relay does have the full report and it should get passed up the chain to the root and be reflected in the console. The process from the client submitting the full report to it showing up in the console could take a minute or more depending on the complexity of your networks and infrastructure. Also, the time it takes the client to calculate everything required for a full report could take some time depending on how many properties it has to report.

dfinjr · November 10, 2015, 7:00pm

When we send a refresh against the system it does get it and does post a full report, however the status of the two mentioned properties stay the same.

jmaple · November 10, 2015, 7:16pm

I’ve attempted the same thing. Only thing that worked for sure was clearing the “__results” file (possible corruption of the file but I’m unsure how to determine it using some analysis) or deleting the whole “BESData” folder and restarting the client.

jmaple · November 10, 2015, 7:39pm

As a test. I made a copy of a “bad” “__results” file before deleting it and compared the contents to the newly created one and everything it contains looks to be the exact same. The only odd thing is the size of the bad file is 4 bytes bigger than the good file.

There doesn’t seem to be any way to tell why the bad file is any different than the original nor is there a way to know why the client won’t read the bad file any more but as soon as it’s recreated, it begins reporting normally.

EDIT: The client reading the file was probably not the right way to say it. When the results this file contains are sent off, they don’t seem to be interpreted correctly when they are ingested into the database?

steve · November 10, 2015, 10:00pm

This sounds like the same issue discussed here Searching Relevance for <not reported>. This other thread explains why you can’t use automatic groups or dynamically target such endpoints, and the suggested approach. In order to automate it, you would have to use the REST API to deploy the actions automatically based on what the server sees.

I’m confused as to why this requires automation, though. If you’ve identified the problem, I would assume you’re not deploying the bad package anymore, so why do you expect to see problem systems continuing to show up in the console?

jgstew · November 11, 2015, 11:24pm

Because automation is always nice to have, at least to detect the problem and group the computers that have it if nothing else.

jmaple · November 12, 2015, 1:50pm

@steve Automation of this problem would be nice because while I’m not able to determine the root cause of why this happens, I get 4 to 5 machines that do this every week. Most of them are PCs and my suspicion is that they were decommissioned and are being recommissioned without being cleaned up properly. I don’t run our Site Support team who commissions them so I don’t have an effective way to test but at the moment, these machines need intervention and I’d rather not have to run the task manually if I can avoid it for such a simple procedure.

dfinjr · November 13, 2015, 4:40pm

Couldn’t agree more. One question for jmaple… How do you identify a bad __results file vs a good one?

jmaple · November 13, 2015, 4:46pm

I grabbed a copy of the file that were producing the “not reported” result before deleting it. The new one would be the good one.

dfinjr · November 13, 2015, 4:48pm

yeah for sure, but is there any markers on the bad one that would possibly make it stand out (something I could scrape) in order to identify that the system does have a bad __results file?

jmaple · November 13, 2015, 4:49pm

That’s what I’ve been trying to determine and from all the tests I’ve done, there doesn’t seem to be any difference between the files. They look to be exactly the same.

AgentGuy · November 15, 2015, 10:15pm

The _results file is a copy of the property results the agent has already reported. The agent uses this file to tell what property results have changed since it last sent a report. When you delete this file, it causes the agent to behave as if the server does not have previous values for any of the ‘globally activated’ properties, and so the agent reports the values whether they are the same or not.

I would try to figure out what is causing your agents understanding of what it has reported to the server differs from what the console shows. Some possible areas to investigate: the console cache may not have been filled properly, or filldb may not have successfully placed the values in the database, or the server may have discarded the prior results when you edited the property expression, yet the result was the same on the agent, so it didn’t report a value.

jgstew · November 16, 2015, 8:13am

Doesn’t that mean that deleteing the _results would be similar to telling the client to send a full report / refresh.

jmaple · November 16, 2015, 11:12am

@AgentGuy I find it unlikely this is an issue with the console because as soon as the client generates a new _results file, all properties start reporting correctly. I can only assume that the file is slightly corrupted when the client is sending it to the core. If it wasn’t the file, I would think deleting it wouldn’t change what we see in the console.

steve · November 17, 2015, 7:21am

What @AgentGuy is saying is that deleting the results file is the same as telling the client to send a full report, so the file isn’t the cause, it is a mismatch between what the client believes it has reported and what the server has. While there may be some unique cases, I would expect most (if not all) of these cases could be resolved using a SendRefresh or (if UDP is not reaching the endpoint) a notify client forcerefresh action.

There was an issue with the BESComputerRemover that could cause this to occur where fully deleted endpoints were not being marked properly for refresh if they came back online. If you haven’t updated your BESComputerRemover in several months, you might check for our latest version to address this.

My question about the need for automation was really directed at the OP who had an identified cause that was no longer occurring. For other intermittent occurrences, the BESComputerRemover should provide sufficient automation, as it is capable of removing “Not Reported” computers and marking them for full refresh on the next registration.