Clients failing to refresh unless service is restarted

eboth225 · June 21, 2016, 8:10pm

I have 115 servers that act as a BES relay and client. I am having trouble with 26 of these clients not reporting in a timely manner. They are not receiving the new actionsite version immediately like the other 89 servers are.

They do not seem to be receiving the refresh command, and they take a long time to report into new relevance. Eventually they do report in, but hours later. They do check in immediately after I restart the client service, but after that they stop responding again.

So far we have tried to no avail:

reset bes client task
uninstall / reinstall of client
uninstall of relay

I am looking for some further troubleshooting advice. Thanks.

jgstew · June 21, 2016, 8:58pm

This is almost certainly happening because these devices are not getting the UDP notifications from the BigFix infrastructure.

You need to make sure the OS firewalls are set to allow incoming UDP over 52311 as well as the same on any hardware firewalls.

If the devices that are having this issue are behind a NAT, then you must put a relay behind the NAT to address this.

If you cannot get UDP to work, then you can enable command polling. This will cause the devices to check in proactively on an interval to check for new things to process. I recommend enabling command polling on all devices to once every 3 hours and on devices that are not getting UDP notifications to once every hour. Command polling will give your relays slightly more work, but usually very insignificant overall.

Do not enable command polling to be more often than once every 30 minutes because in most cases this will cause the client to interrupt itself in order to do the polling too often and cause a never ending eval loop.

See here: https://bigfix.me/fixlet/details/3798

eboth225 · June 21, 2016, 9:11pm

I assume restarting the service initiates an immediate command pole which is why it works immediately after that, correct?

I will double and triple check with my network engineer regarding this, but this is strange because this has worked in the past and I don’t think we’ve made any major access list/firewall changes recently.

Regardless, thanks for the quick reply.

jgstew · June 21, 2016, 10:24pm

Yes, as does automatic relay selection, which normally happens every 6 hours.

Clients become aware of new items / actions through the following methods:

UDP Notification ( usually 10 to 30 seconds )
Command Polling Interval ( off by default, configurable )
Command Polling following an automatic relay selection ( once every 6 hours by default if applicable )
Command Polling following a restart of the client service (or computer as a whole)
the Gather Interval (24 hours by default)

Basically, by default, out of the box, clients only guarantee a check once every 24 hours, so you may have to wait as long as 24 hours for something new to be noticed if they just did a Gather right before you create/deploy something.

Command Polling is a way to ensure that the maximum amount of time a client will wait until checking in is lower when the system is awake an online.

Part of the reason I think using Command Polling on all machines is a good idea is that a computer that is asleep will not receive UDP notifications, which means it may have missed commands that it won’t know about until it polls for commands again or gathers. The same is true if a laptop is changing WiFi networks and the UDP notification is set to the old IP instead of the new one because it hasn’t registered it’s new IP with the relays yet. This is why a machine that would otherwise get UDP notifications could miss them, and why command polling is a good fallback, even if it is ideal to try to ensure UDP is working.

eboth225 · June 23, 2016, 9:59pm

I verified with our network admin that we definitely have udp port 52311 opened up at all 115 locations and the ACL is identical in regards to BES traffic. Which would explain why we have some clients that are working correctly.

For testing purposes, he actually opened up all network traffic wide open between the BES host and one of the offending clients, and this did not resolve the issue. The client still does receive the UDP refresh messages.

We don’t have command polling enabled, but we will definitely consider turning it on for the reasons you mentioned, but in the mean time I am still stumped as to the source of the UDP notification problem.

The logs on the client don’t show much (in my opinion).

Are there logs on the BES host that I can start looking through for clues? Let me know what you think, thanks.

JasonWalker · June 23, 2016, 10:07pm

Do you have Windows Firewall or another host-based firewall on these particular Relays that could be blocking it?

Are these Relays also doing something that’s port-heavy like acting as a DNS server? If another service happens to grab UDP port 52311 before the BES Client service starts, the BES Client won’t be able to listen on that port. Check with something like Windows Resource Monitor, or Sysinternals’ TcpView, or ‘nbtstat -anob’ to see if another process is listening on udp/52311.

eboth225 · June 23, 2016, 10:31pm

We DO run a DNS server on the same machine that hosts the relay/client.

Through Resource Monitor I see this:
Image PID Address Port Protocol Firewall Status
dns.exe 1592 IPv4 unspecified 52312 UDP Allowed, not restricted
BESClient.exe 1380 IPv6 unspecified 52311 UDP Allowed, not restricted
BESClient.exe 1380 IPv4 unspecified 52311 UDP Allowed, not restricted
dns.exe 1592 IPv4 unspecified 52310 UDP Allowed, not restricted

…but the IPv4 one is sort of blinking on and off every second. It keeps disappearing and reappearing in the list, whereas the IPv6 is constant.

I checked a different client that is not experiencing this problem and the IPv4 is not “blinking”, it is constant in the list of listening ports.

I think we may be on to something here.

eboth225 · June 23, 2016, 10:48pm

Problem solved! The solution:

Add port 52311 as a reserved port in the registry:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

Excellent call JasonWalker!

JasonWalker · June 24, 2016, 2:38pm

I’ve had problems with that one too.

There are actually a couple of fixlets in the BES Support Site to deal with this. They are written specifically to flag on DNS Servers, but really this could occur on any system; but is more likely to occur on a system that uses a large number of dynamically-allocated (“ephemeral”) ports (like DNS).

Fixlet 597 Reserve port for BES Client for DNS Servers
Task 765 Reserve port for BES Client for DNS Servers (Windows 2008 / 2008 R2)

I don’t see the harm in adding the ReserverPort on every client installation (unless you consider one might run BigFix on an alternate port rather than 52311)

AlanM · June 24, 2016, 8:11pm

The fixlets should be pulling the port that the agent is using so it doesn’t matter what your port is

JasonWalker · June 24, 2016, 8:27pm

That’s good then. I’m considering putting in an RFE to have the BES Client Installer itself put a reservedport into the registry, without a need to run the Fixlet at all. But that would compete with the RFE’s that I care more about.

eboth225 · September 12, 2016, 3:35pm

This is an update to my original situation.

The ReservedPort registry key even with hotfix KB2665809 did not resolve this issue in all of my servers. DNS was still grabbing port 52311 sometimes.

After doing some more research I found this key:
value “SocketPoolExcludedPortRanges” of key “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\DNS\Parameters” of (native registry)

which DID finally fix my problem when I added 52311-52311 into that key. After doing that, the DNS service would correctly exclude that port from being used.