BES Client not reporting and requires reboot to start

Hi All,

I am currently having an issue with random systems having the BES client lockup after reboots. I have determined the root cause, but want to explain what is happening in case anyone else is seeing this issue

When the system is in this state, the service starts and I can see the client log file updating until it freezes at the entry:

Encryption: optional encryption with no certificate; reports in cleartext

If I try to stop the service, it will fail to stop and the service state will report ā€œStoppingā€. I can end task the service and it will let me restart it, but it still will hang at the above message.

I have determined that the root cause is a call to the Win32_ServerFeature class which seems to be having an issue with not returning data. When I log on to the problem system and use wbemtest with the query ā€œSELECT * FROM Win32_ServerFeatureā€, the status will report ā€œOperation in progressā€¦ā€ and never return. Any other WMI query is working fine (except Win32_QuickFixEngineering see below). I have seen the same issue on multiple servers. Once the server is rebooted, it seems to clear up the issue.

I have a PMR open with IBM to look into this issue (opened before I determined that WMI was the issue) and a bug was opened to look into a method to timeout the call to WMI.

I had a custom managed property that did the following
if ((name of it contains ā€œWin2008ā€ or name of it contains ā€œWin2012ā€) of operating system) then (if (exists wmi) then (exists string values of properties ā€œNameā€ of select objects ā€œ* from Win32_ServerFeature where Name = ā€˜Remote Desktop Servicesā€™ā€ of wmi) else (false)) else (false)

There is also the analysis ā€œInstalled Windows Patches Informationā€ in the BigFix Labs site that uses Win32_QuickFixEngineering that also seems to have the same issue.

Welcome to WMI hell, a new extension of the DLL hell that has plagued Windows for most of its life.

WMI can be messed up and this is why we have a setting to disable calling into WMI

It looks like over time there may have been something added to force Windows to allow WMI to time out so your PMR should be coming back to you with some info that there may be something for us to do to not let a bad WMI system hang us indefinitely. Again this will depend on the OS doing the right thing but its a promising find I just made.

WMI can also have other ā€œstrangeā€ vendor associated things hooked to it when anyone makes a WMI call (and this is often the cause of these hangs or misconfigured setups)

2 Likes

Hi Alan,

Yeah I try to avoid WMI, but it seems to be the only way to get some info that I need to get.

In the past I have also used powershell to get the same info (installed features) that would output to a file and I could retrieve the file contents, but I was trying to avoid the two step process. Maybe I have to go back to that for this information.

I hope that you can find that timeout setting.

Thanks

Martin

We also experience issues where the bes client service on Windows cannot be restarted, and the server has to be rebooted. I checked and I also have an analysis that pulls from WMI from the Win32_ServerFeature. Not sure if that is the corrupt WMI piece that is the root cause for both of us, or just a coincidence.

Whatever PMR was open (any chance you have the number?), it wasnā€™t fixed, and weā€™re using 9.5.12. Weā€™ve had this Analysis in place for years and just now we had this same issue: Endpoint stops reporting in, you can only kill the service, and then it hangs again. Enabling diag logging you see that it stops on that exact WMI call. Running the call in the QnA hangs and using WMI Explorer hangs, so it is certainly WMI.

Yes, we all try to avoid WMI like the plague, but the BES client needs to be able to continue if encounters a faulty WMI call and not hang.

1 Like