Potential Memory Leak - BF Agent and Server 2008 R2

After some in depth testing, my group has found that potential memory leak when the BF Agent is installed on a Server '08 R2 machine. Here is what we are seeing.
Server 08 R2 is installed fresh, no additional software is added only the BF Agent. Once the machine checks into our WSUS server for updates, Trustedinstaller.exe starts to grab excessive memory. Over the course of the weekend, TrustedInstaller.exe had grab in excess of 6GB of RAM. This obviously becomes an issue as the system starts having stability issues.

We have tried fully patching the server OS, same result. We have also tried 3-4 older versions of the BF agent with no change.

In our environment, we use currently have Server '08 R2 in several key locations, acting as relay servers, so this is a bit of a problem.

Has anyone seen this issue?

1 Like

I have a variety of 2008 r2 machines working properly. I’m a bit curious why you think it is bigfix when using wsus and trusted installer? Have you tried with bigfix not installed? From your post it seems it may be an windows update issue.

What leads me to starting with BF, is that nothing is installed on the machine, other than BF.
We have taken a completely fresh build of windows Server '08 R2, both fully patched and (un-patched, outside of any updates that were pulled from the CD\ISO) and the issue does not appear until the BF agent in pushed to the machine.

We’ve even removed the BF agent and things go back to normal, once the agent is re-introduced, the problem occurs again.

Odd. Does the agent appear to be doing anything from the logs?

Nothing out of the ordinary.
I just pushed version 9.5.5.196 to see if anything changes.

Question for you @patchingout, what version of BF are you running?

I am running 9.5.8.38. I was thinking something may be running an action through it.

Trustedinstaller is notorious for leaking memory. Not sure it’s a Bigfix issue.

Check https://www.reddit.com/r/sysadmin/comments/3frcib/heads_up_kb3050265_fixes_major_memory_leak_in/
And see whether you’ve applied the related hotfixes.

1 Like

@patchingout We are also running 9.5.8.38.
I don’t see any actions being applied through the logs.

@JasonWalker When testing, literally the only thing installed on the server is BigFix Client, the issue is not seen when bigfix is not installed. Going to try and install the suggest KB from your posted article to see if there is any change.

The TrustedInstaller memory leaks I’ve seen can be tied to very minute differences on the system, like whether there is a user logged on or whether Server Manager is left open.

The Server Manager-triggered leak appears to be caused by Server Manager refreshing the list of installed features/roles according to one of those blogs’ musings. I wonder whether an Analysis looking at roles/features may trigger the same? Or really any number of other WMI queries might do it as well.

@JasonWalker Going of a hunch from your analysis comment. Do you know of a wan to create a DMZ of sorts for Bigfix computers. The thought would be a computer group that is not assigned any analysis, open actions, etc.?

It’d be a matter of how you assign Operator Rights to the machine. You’d want it to have no Operators, except perhaps a test Operator with no analyses activated except what you want to test.

And you’d have to ensure you’re not using a Master Operator for actions/analyses you want to test, because there’s not a way to prevent a MO account from affecting the client.

1 Like

You would also exclude the test computer from all custom & external sites, which would help. Any analyses in the master actionsite can’t be avoided, but those could have relevance added to exclude the test system as well.

Analyses that are globally activated in BES Support are unavoidable.

Good news @JasonWalker and @jgstew . I believe I have found the cause of all my frustrations.
After enabling the debug level logging on my test machine (Running Windows Server '08 R2) I was able to pin down some interesting information.
In conjunction with the debug log, I used procmon to monitor when the trustedinstaller.exe process started so that I could try and line up what BF Client was doing at when the process was initiated.
Here is what I found, and have been able to repeat.

At the current moment, 2 fixlets seem to be the cause. Both were imported into BF back in October '17 from a C3 Protect site on Bigfix.me. When the client runs through the relevance statements for each fixlet (3 statements in total) Trustedinstaller would behave in the following manner.

  • When trustedinstaller.exe is not running on the server, each relevance in and of itself would start the trustedinstaller.exe process with an inflated about of RAM usage right of the bat (+60,000K). I was able to determine this by stopping the BES Service, and running the relevance’s individually though the QNA tool.

  • When trusted installer is running, and each relevance is checked, small amounts of additional ram would be added to the running process. Anywhere from 200-600K each time the relevance was checked.

So in my assumptions, when these relevance’s were be evaluated at each report cycle, they were the cause of the ever inflating trustedinstaller.exe. (We were seeing TrustedInstaller.exe using upwards of 6GB of RAM across our relay servers)
I decided to remove all computer subscriptions from the custom site that I had added these fixlets too, and will slowly re-add computers once I determine that it is in fact both of these fixlets that are the root of the problem, of course after removing these 2 fixlets from the site.

I am going to continue to do some testing to determine if there are any other fixlets in that site that are causing problems or if it just those 2.
Here are the Fixlets and their associated relevance statements. If someone would like to do some independent testing with Server '08 R2, let me know if you are seeing the same result.

  1. Config - Hyper-V Platform - Enable - Windows

    set of (string values of properties "Caption" of select objects "* from win32_optionalfeature where installstate= 1" of wmi) does not contain "Hyper-V Platform"
    
    set of (string values of properties "Caption" of select objects "* from win32_optionalfeature" of wmi) contains "Hyper-V Platform"
    
  2. Config - Hyper-V Platform - Disable - Windows

     set of (string values of properties "Caption" of select objects "* from win32_optionalfeature where installstate= 1" of wmi) contains "Hyper-V Platform"
    

Any testing or further information about these particular relevance statements would be great!

3 Likes

Very impressive debugging! I’ve reported the issue here: https://github.com/strawgate/C3-Protect/issues/33

This actually seems like an odd flaw in WMI / trustedinstaller itself that this relevance is tripping. This definitely shouldn’t be happening and these WMI calls shouldn’t be doing this, but I wonder if just rewriting these relevance statements slightly would help.

Are all updates installed for this OS? Including any recommended ones? I’m not sure if there are updates or hotfixes for WMI / Trusted installer specifically that could address this.

There could be some sort of flaw with the BigFix client not terminating the thread that runs the WMI call, but if that were the case, I would expect this issue to be much more widespread and be caused by any WMI call.


This is how I would try rewriting the relevance to be more efficient:

Enable:

not exists string values whose(it = "Hyper-V Platform") of selects "Caption from win32_optionalfeature where installstate= 1" of wmis

AND:

exists string values whose(it = "Hyper-V Platform") of selects "Caption from win32_optionalfeature" of wmis

Disable:

exists string values whose(it = "Hyper-V Platform") of selects "Caption from win32_optionalfeature where installstate= 1" of wmis

This may be a workaround as well: https://github.com/jgstew/bigfix-content/blob/master/fixlet/clientsettings/Set%20__BESClient_Resource_PowerSaveEnable_%20to%20_1_%20-%20Universal.bes

CC: @strawgate @AlanM

1 Like

Plugging these newly created relevance statements cause the same behavior :smile:

2 Likes

Wow, really? Confirmed that quickly?

I would recommend trying this possible workaround: https://github.com/jgstew/bigfix-content/blob/master/fixlet/clientsettings/Set%20__BESClient_Resource_PowerSaveEnable_%20to%20_1_%20-%20Universal.bes

I strongly recommend this setting for all VMs, docker containers, and battery powered devices to limit idle CPU usage, but it stops the eval loop for 10 minutes (by default) if no changes are detected, which might actually solve this issue.

Applicability relevance (fixlets/tasks/etc) is run on an infinite loop that is CPU usage limited. This setting causes the loop to pause when no changes are detected for 2 loops in a row, then continue to pause for 10 minutes after each subsequent loop after that as long as no further changes are detected.

The only case where this could have a negative affect is when you are in the middle of aggressively patching a server during a maintenance window, but even then, things should be changing often enough that the loop shouldn’t pause.

1 Like

Since it appears to be leaking from trustedinstaller / wmi, can you check the hotfix status for the trustedinstaller memory leaks?

1 Like

Also, I created an issue for this problem in the C3 github: https://github.com/strawgate/C3-Protect/issues/33

It seems like this isn’t the relevance’s fault, so might not be anything to fix on the C3 side of things.

I’ve also found 2 more fixlets that cause the same type of problem. All 4 of these fixlets have the “win32_optionalfeature” query in common

  1. Config - Isolated User Mode - Enable - Windows
  • set of (string values of properties “Caption” of select objects “* from win32_optionalfeature where installstate= 1” of wmi) does not contain “Isolated User Mode”

  • set of (string values of properties “Caption” of select objects “* from win32_optionalfeature” of wmi) contains “Isolated User Mode”

  1. Config - Isolated User Mode - Disable - Windows
  • set of (string values of properties “Caption” of select objects “* from win32_optionalfeature where installstate= 1” of wmi) does not contain “Isolated User Mode”
  • set of (string values of properties “Caption” of select objects “* from win32_optionalfeature” of wmi) contains “Isolated User Mode”
1 Like