BES Root Server Service Crashes Continually

Hello All,

Our BigFix installation (9.1.1117) recently started exhibiting these same problems. The BesRootServer.exe service will crash (Application Error 1000, faulting module ntdll.dll in event log) about every 20 minutes.

Here’s what I’ve tried so far:
*SFC Scan (no failed files)
*ChkDisk (no problems found)
*Antivirus On Access Scan disabled
*Checked SQL database consistency
*Used Audit Cleaner Tool to clean up database
*Rolled back recent system patches

System Info:
IEM/BES 9.1.1117
SQL Server Standard 2005
Windows Server 2008r2

Any thoughts or ideas welcomed.

1 Like

I would definitely file a PMR: How to ask for IBM product help: PMRs, RFEs, and more

Beyond that, not sure, though I would recommend making a full system backup if you don’t have one, then upgrade the root server to a newer version.

JGStew-

I was hoping to avoid the root upgrade because of all associated tasks of upgrading 10,000 clients and the associated relays on the spur of the moment, but that’s what we needed to do I guess. Our installation is working well now, we just needed to bite the bullet and patch up. Thanks for the response!

-Mat

You can certainly upgrade the Server(s) independently of the Relays and Clients. When upgrading the servers, the only pieces you really have to match versions on are the BES Server, BES Web Reports, and BES Consoles. You can delay the Relay and Client upgrades to convenient times.

1 Like

Yes, I second exactly what @JasonWalker said.

You DO NOT need to upgrade your clients and relays. In nearly 100% of cases, you should be able to run your relay and client versions a bit ahead or behind the root with little issue. It is not optimal, but it should be fine. This is especially true if it is in the same major release, or at least the same major version, but an 8.0 client will work with a 9.1+ relay & root.

In fact, if you do upgrade your root, I would recommend sitting on it a tiny bit before upgrading your relays, then upgrade your relays and sit on that a bit before upgrading your clients… and I would not recommend updating your relays & clients all at the same time… do that progressively.

In the case of the relays, start at the top and work your way down, but don’t upgrade all of a redundant set of relays at the same time. If you have 4 top level relays, upgrade 2, wait a while, then the other 2. If you have 30 relays below that with 2 in each of 15 locations, then don’t upgrade both in the same location at the same time.

In fact, with the REST API, you could create a repeatable relay upgrade roll out schedule that would involve multiple actions taken at once, but with progressively later start times so that if anything goes wrong you can put the breaks on it, but if nothing goes wrong, eventually all relays will get upgraded without you having to poke at it multiple times. In fact, since I don’t think you really must start top down with relay upgrades, you could just have the upgrade distributed over a large enough time period so that only one relay gets upgraded per X hours, allowing you to stop things if problems appear.

I am now experiencing this same issue. After a reboot, it can take up to 12 + hours, but once the crash occurs, it will keep occurring as frequent as 10-30 min and as delayed as a few hours. I turned on verbose logging and I am waiting for it to crash after I rebooted, but I’m not sure how long it will take. Furthermore, I cannot leave verbose logging on at night because I am running too many time sensitive jobs and performance will be degraded.

I am currently running 9.2.2.21 root. 9.2.1.48 relays and a mixed bag of clients 9.2.1.48 and older. This just started happening the other day, and has been persistent as hell, but I am unable to find anything that may have occurred on that day to cause something like this. Any help would be appreciated, I am hoping to not force an upgrade, but if I did have to, I guess I would go up to the highest patch level of 9.2.X. Any help would be appreciated, thank you

You should raise a PMR

2 Likes

I did, but it is not doing much good for now. My issue now is that I am in loop at this point. I have to disable and reboot the server before polling begins to ensure the besrootserver does not crash. After polling, I am turning on verbose logging, then restarting the besrootserver, but the server is not crashing before polling begins, where I have to turn off polling and reboot the server. Support said that they need a log when it crashes, and that is all they will have to go off of. It looks like at some point, I will have to leave verbose logging on, have polling run like garbage and spend a night/morning cleaning up the issues after things don’t run properly, and hope that the system crashes so we can get the log over to support.

Upgrade to 9.2.9.36 fixed the issue as far as I can see, no more crashing of the besrootserver service. I will report back if I hear anything else.