IEM agent 9.0.787 CPU spike issue

(imported topic written by NirajG)

We have observed many windows agents getting cpu spike upto 70%. Agent OS include xp,win7 and windows server 2008. There are no open actions that could lead to CPU Spike. Only analysis activated are from Inventory site.

Server scenario : Windows server 2008R2 + Local SQL Server 2008 + Top level and branch relays to scale the architecture. IEM version fresh install 9.0.787

Client scenario : Currently we have about 30,000 endpoints on console, no actions sent to clients apart from daily blank task.

Clients are on manual relay selection with following settings :-

_BESClient_Log_Days=30

_BESClient_Comm_UseUrlMoniker=1

_BESClient_Comm_SkipInternetActiveTest=1

_BESClient_Register_IntervalSeconds=10800

_BESClient_Download_MinimumDiskFreeMB=200

_BESClient_Comm_CommandPollEnable=1

_BESClient_Comm_CommandPollIntervalSeconds=3600

_BESClient_Download_DownloadsCacheLimitMB=200

_BESClient_Download_UtilitiesCacheLimitMB=20

_BESClient_Report_MinimumInterval=300

__RelaySelect_Automatic=0

__RelayServer1=http://:52311/bgmirror/downloads/

__RelayServer2=http://:52311/bgmirror/downloads/

__RelayServer3=http://< Secondary Top level relay IP>:52311/bgmirror/downloads/

Whenever there is a CPU spike following error is observed in client logs

Retry error, attempt 8 failed for SetModificationTime (C:\Program Files\BigFix Enterprise\BES Client__BESData\actionsite__Local\Get\ComputersRevoked.crl)

Gather::SyncSite caught FileIOError (44) FileIOError

Also client folder is being excluded from antivirus scanning. Client CPU usage settings are on default of <2% CPU (Current Balance Settings: Use CPU: True Entitlement: 0 WorkIdle: 10 SleepIdle: 480)

Any help on why this error would occur or how to solve the issue.

Thanks

Niraj G

(imported comment written by AlanM)

The type of message indicates that this is happening during a gather operation.

Are these physical or virtual machines?

Is the site you are gathering large?

The error indicates the site was trying to obtain this file from the relay and making the file the right date/time. This does indicate that the file was busy for quite a while as we tried 8 times to set this.

(imported comment written by MBARTOSH)

I am very interested in this problem because we are planning to upgrade to 9.0.787 on January 8. However, this sounds like a show stopper. We definitely do not want our clients to have CPU spikes. In many cases the client do not have CPU to spare. I hope others will respond to this thread if they have had this problem.

(imported comment written by AlanM)

The 9.0 clients are much more stable and have more control so this customer’s issue must be somewhat unique. I have only heard of any issues when sites are on the large side, such as the action site being over 100MB, which isn’t recommended.

You will see much better control in the 9.x agents with regards to control including the switch to CPU based usage calculations which help virtual endpoints as well.

(imported comment written by NirajG)

Hi Alan,

There are no open actions to client except for one blank task a day. Attaching older log files for better analysis. Let me know if debug logs are required.

We are currently only subscibed to external sites including LCM module and Software use analysis.

Machines involved are windows clients winxp,win7 and windows server 2008,windows server 2008R2. Alan other machines communicating to same relay function fine.

We have already opened PMR for above issue and working with support team. Currently root cause has not been identified neither is there any solution.

(imported comment written by AlanM)

The issues I can see from your log are:

  • Relay selection: The relay selection is not doing well and is having to go through many attempts to relays to get to a working relay. This can be expensive and could be part of the issue so either you need to address the issue with pruning your relay list or making these clients manual relay selection.

  • Virus Scanning: It looks like there is something scanning the __BESData directory and interfering with the agent’s work which can be seen by some of the retry attempts being done.

I don’t believe I have seen a PMR related to this issue.

(imported comment written by NirajG)

Hi Alan,

Actionsite size : 8MB

Relay selection is currently on manual and there are no actions sent.

You could check all the files associated with this system on PMR 14920,999,744.

Antivirus is currently set to exclude
C:\Program Files (x86)\BigFix Enterprise\BES Client and
C:\Program Files\BigFix Enterprise\BES Client folders on all machines.

Support has advised to disable antivirus but client does not approve of same.

Thanks

Niraj G

(imported comment written by MBARTOSH)

NirajG

Why isn’t the exclusion set at
C:\Program Files (x86)\BigFix Enterprise and

C:\Program Files\BigFix Enterprise? It seems like the relay folder is going to be included in scanning.

Also, I have checked and there does not seem to be an open PMR for this issue. It would be helpful to open one if you have not.

(imported comment written by MBARTOSH)

NirajG,

We are scheduled to upgrade to 9.0.787.0 tomorrow night, but I want to make sure I am not going to have the problems you are having. Would you mind sharing your PMR number so that I can follow-up with support to find out if we are going to have the same problem that you are having. We are also supporting WinXP and Win7 machines. How big is your action site? Is your action site greater than 100 MB? I think you check the size by checking the size of the folder C:\Program Files (x86)\BigFix Enterprise\BES Client__BESData\actionsite.