I am currently having an issue with random systems having the BES client lockup after reboots. I have determined the root cause, but want to explain what is happening in case anyone else is seeing this issue
When the system is in this state, the service starts and I can see the client log file updating until it freezes at the entry:
Encryption: optional encryption with no certificate; reports in cleartext
If I try to stop the service, it will fail to stop and the service state will report āStoppingā. I can end task the service and it will let me restart it, but it still will hang at the above message.
I have determined that the root cause is a call to the Win32_ServerFeature class which seems to be having an issue with not returning data. When I log on to the problem system and use wbemtest with the query āSELECT * FROM Win32_ServerFeatureā, the status will report āOperation in progressā¦ā and never return. Any other WMI query is working fine (except Win32_QuickFixEngineering see below). I have seen the same issue on multiple servers. Once the server is rebooted, it seems to clear up the issue.
I have a PMR open with IBM to look into this issue (opened before I determined that WMI was the issue) and a bug was opened to look into a method to timeout the call to WMI.
I had a custom managed property that did the following
if ((name of it contains āWin2008ā or name of it contains āWin2012ā) of operating system) then (if (exists wmi) then (exists string values of properties āNameā of select objects ā* from Win32_ServerFeature where Name = āRemote Desktop Servicesāā of wmi) else (false)) else (false)
There is also the analysis āInstalled Windows Patches Informationā in the BigFix Labs site that uses Win32_QuickFixEngineering that also seems to have the same issue.
Welcome to WMI hell, a new extension of the DLL hell that has plagued Windows for most of its life.
WMI can be messed up and this is why we have a setting to disable calling into WMI
It looks like over time there may have been something added to force Windows to allow WMI to time out so your PMR should be coming back to you with some info that there may be something for us to do to not let a bad WMI system hang us indefinitely. Again this will depend on the OS doing the right thing but its a promising find I just made.
WMI can also have other āstrangeā vendor associated things hooked to it when anyone makes a WMI call (and this is often the cause of these hangs or misconfigured setups)
Yeah I try to avoid WMI, but it seems to be the only way to get some info that I need to get.
In the past I have also used powershell to get the same info (installed features) that would output to a file and I could retrieve the file contents, but I was trying to avoid the two step process. Maybe I have to go back to that for this information.
We also experience issues where the bes client service on Windows cannot be restarted, and the server has to be rebooted. I checked and I also have an analysis that pulls from WMI from the Win32_ServerFeature. Not sure if that is the corrupt WMI piece that is the root cause for both of us, or just a coincidence.
Whatever PMR was open (any chance you have the number?), it wasnāt fixed, and weāre using 9.5.12. Weāve had this Analysis in place for years and just now we had this same issue: Endpoint stops reporting in, you can only kill the service, and then it hangs again. Enabling diag logging you see that it stops on that exact WMI call. Running the call in the QnA hangs and using WMI Explorer hangs, so it is certainly WMI.
Yes, we all try to avoid WMI like the plague, but the BES client needs to be able to continue if encounters a faulty WMI call and not hang.