Client fails to wake up and report

(imported topic written by SystemAdmin)

We’ve been seeing this all to frequently lately. The BES client just “goes to sleep”. If fails to report in to thhe server and greys out. Occassionally it will just never wake back up. We’ve had this on brand new Dells running XP and windows 7 and VMware VM’s running Server 2008. A couple of the clients do it consistently. Sending a “Send Refresh” from the console almost always wakes them up and they hum along for a few hours then do it again.

Right now the windows 7 client I am writing this from is having the issue. In this case the system was booted off the network and then placed on the network. The BES client tried to make two connections when it was off the network and then just “fell asleep” and never woke back up. It’s been over 2 hours since the last entry in the client log and the system has had a network connection for 2 hours.

Anyone else having this issue? We will be calling support but it’s always good to see what others have done to perhaps fix the problem.

Update: So the system went into standby and sat overnight. Brought the system out of standby and the BES client doesn’t seem to want to wake up again - I’ve waited over an hour. I’ve enabled client side debugging to see if that can shed any light. Hopefully this is a known issue and will be fixed in the point relase that seems to be on the horizon.

(imported comment written by BenKus)

Hi jspanitz,

My best guess is that you have big baselines or some other slow content that is slowing your agent down… If your BigFix Agent has too much work to do, it would rather be slow than use too much of the system’s CPU… Someone from support can look at it and you might consider getting the services team to do an optimization service to help you keep things working smoothly: http://support.bigfix.com/services/

Also, you should check your health checks dashboards which should help identify some problems…

Ben

(imported comment written by SystemAdmin)

Ben,

We do have one large baseline but it does not apply to Windows 7 - it’s filtered by having to be a member of a group. Does mearly having the baseline slow down all agents across all operating systems, or are they filtered before hitting the client by the relevance? The debug logs don’t show a thing. When the agents fails to wake up, the debug log is quiet too. Which leads us to believe there is an agent issue.

Also, what seems to be playing out here is that this issue is only happening on Windows Vista, Windows 7 and Windows Server 2008. The large baseline ran against a number of XP and 2003 servers within 30 minutes and the clients never timed out or went to sleep.

I’ll check with support in the morning.

John

(imported comment written by rmnetops91)

Please post here what you find. We experience the same thing on 2008 servers.

(imported comment written by ggerling91)

We also are seeing this behavior and would like to know the results of your support call.

(imported comment written by BenKus)

Hey guys,

Having big baseline actions will affect all agents in many cases… I am guessing that this is a big part of the issues that you guys are seeing…

Ben

(imported comment written by SystemAdmin)

I have not gotten to call support yet and I won’t be able to until Friday. But I will post the results when I have them.

Ben, if it is a baseline issue, why do we not see the issue on Windows 2003 and XP? And I suspect when I call support they are going to blame it on a baseline without being able to actually see why it is happening. Without any real data to back it up, my hunch is that this is a client compatibility issue with the newer OS’s. Otherwise I would think we would see it across all Windows clients. All I do know is that it is very frustrating and there should be an easier way to see what’s going on on the client.

BTW, just for reference, we updated to 7.2.5 and the problem still exists. I was hoping that would be the magic fix - to bad!

(imported comment written by jessewk)

You can use the troubleshooting task in the BES support site to run the client diagnostics tool. Take a look at the profiling output to see if it sheds any light.

(imported comment written by SystemAdmin)

Jessewk - thx. we actually have done this on multiple clients and it hasn’t been able to pinpoint the cause. Great tool thought, has helped in many other situations.

All - at the direction of level 3 support we ran the BES client usage profiler. I have personally never used the tool up to now, but I believe it to be invaluable now that I have. What we found so far - this is all preliminary - is that one certain group of fixlets (all for the same MS patch) was taking an extremely long time to process on WinVista, Win7 and Win2k8, while on WinXP and Win2003 it was acceptable.

I am not sure what the final answer will be or if this is even the reason the client “sleeps” for a few hours at a time, but it’s what I found and turned back over to level 3 support.

For those who wish to see if they are in the same situation, the fixlets are - MS09-035: Vulnerabilities in Visual Studio Active Template Library Could Allow Remote Code Execution - Visual C++ 2005 SP1 for both 32 bit and 64 bit.

Happy Troubleshooting / Halloween.

(imported comment written by SystemAdmin)

I just dug a little deeper into this this AM and found that the fixlet relevance searches the Side by Side folder - %windir%\windows\winxs folder path. On our systems that folder contians over 9 GB of data and 45,000 files on Vista and 2008 machines and 14 Gb and 65,000 files on Win7. On Windows XP there are roughly just 52 MB and 265 files and on 2003, 55 MB and 338 files. Your mileage may vary depending on what software / updates you have installed.

So this explains the huge time differences when running the relevance logic across the different operating systems. Looks like the fixlet relevance is the only problem here. I’ll update the thread when I get the official answer from BigFix.

(imported comment written by BenKus)

Hi jspanitz,

I spoke with the Fixlet team and they are looking into this… We will have an answer for you shortly…

We have used relevance like this before in other Fixlets (MS04-028 and MS08-052) and it was never reported as a problem… but I can see that my Vista computer has 11,000 files so it does appear to be standard for the winsxs folder to have lots of files…

Ben

(imported comment written by SystemAdmin)

Ben,

Great to hear! Pulling those fixlets from our baselines has brought the affected systems back to life. But that, in our opinion, defeats the purpose of a baseline for compliance purposes.

We are anxiously awaiting to see what the fixlet team comes back with as a result. Thanks again.

John

(imported comment written by BenKus)

Hey John,

So this is expensive relevance for this Fixlet if there are lots of folders, but it shouldn’t cause the types of delays you are seeing for the Fixlet alone… Are you using non-efficient mime? If so, I think part of the issue here is that if you have a baseline using non-efficient mime, this relevance gets repeatedly copied over and over and it really exacerbates the problem.

The GDI patch Fixlets are tricky because I believe you do need to iterate through all those folders and look for the GDI files. We are examining other methods that might help, but it is not clear that there is another way.

So for the moment, if you are using non-efficient mime, pulling the GDI Fixlets from the baseline should help a lot (and don’t forget to stop any open baseline actions that have this Fixlet in them).

Ben

(imported comment written by SystemAdmin)

Ben, we are using efficient mime.

I wonder how windows update determines if a system needs the patch. I’m not sure if there is any way to figure it out and we’ve never actually timed windows update on a system that only needs that patch - to see how fast it figures it out.

Thanks for staying with us on this. Much Appreciated.

(imported comment written by BenKus)

Hey jspanitz,

The check itself is easy and relatively quick when using the full speed of the system. Windows Update only runs when you ask it to run and it uses all the system’s resources. We could easily implement a system like that, but then it forces you to run in “batch mode” like other system’s management tools where the agent wakes up periodically (maybe once a day?) and uses all the system’s resources to do checks for a few minutes and then goes back to sleep (and then of course you have all the issues around scheduling the batch time and then if the user is present, then it bothers the user when the system’s resources are being used).

The BigFix Agent does its best to provide “real-time” data and policy enforcement by checking things in the background in a way that is carefully optimized not to interfere with other operations of the computer. In the particular case in this thread, there is something that is causing the agent to report back slower than normal (which in practice is much faster than the other tools which only check periodically) and so we are trying to isolate this so that we can get you back to your report times of 15 minutes (or whatever you are using).

Ben

(imported comment written by SystemAdmin)

Ben,

Sorry, I was more or less just refering to the logic they use to check for relevance. I really hadn’t thought the whole process through like you just did.

No doubt we believe in BigFix and the methods it employs!

John