I have 115 servers that act as a BES relay and client. I am having trouble with 26 of these clients not reporting in a timely manner. They are not receiving the new actionsite version immediately like the other 89 servers are.
They do not seem to be receiving the refresh command, and they take a long time to report into new relevance. Eventually they do report in, but hours later. They do check in immediately after I restart the client service, but after that they stop responding again.
So far we have tried to no avail:
reset bes client task
uninstall / reinstall of client
uninstall of relay
I am looking for some further troubleshooting advice. Thanks.
This is almost certainly happening because these devices are not getting the UDP notifications from the BigFix infrastructure.
You need to make sure the OS firewalls are set to allow incoming UDP over 52311 as well as the same on any hardware firewalls.
If the devices that are having this issue are behind a NAT, then you must put a relay behind the NAT to address this.
If you cannot get UDP to work, then you can enable command polling. This will cause the devices to check in proactively on an interval to check for new things to process. I recommend enabling command polling on all devices to once every 3 hours and on devices that are not getting UDP notifications to once every hour. Command polling will give your relays slightly more work, but usually very insignificant overall.
Do not enable command polling to be more often than once every 30 minutes because in most cases this will cause the client to interrupt itself in order to do the polling too often and cause a never ending eval loop.
I assume restarting the service initiates an immediate command pole which is why it works immediately after that, correct?
I will double and triple check with my network engineer regarding this, but this is strange because this has worked in the past and I donât think weâve made any major access list/firewall changes recently.
Yes, as does automatic relay selection, which normally happens every 6 hours.
Clients become aware of new items / actions through the following methods:
UDP Notification ( usually 10 to 30 seconds )
Command Polling Interval ( off by default, configurable )
Command Polling following an automatic relay selection ( once every 6 hours by default if applicable )
Command Polling following a restart of the client service (or computer as a whole)
the Gather Interval (24 hours by default)
Basically, by default, out of the box, clients only guarantee a check once every 24 hours, so you may have to wait as long as 24 hours for something new to be noticed if they just did a Gather right before you create/deploy something.
Command Polling is a way to ensure that the maximum amount of time a client will wait until checking in is lower when the system is awake an online.
Part of the reason I think using Command Polling on all machines is a good idea is that a computer that is asleep will not receive UDP notifications, which means it may have missed commands that it wonât know about until it polls for commands again or gathers. The same is true if a laptop is changing WiFi networks and the UDP notification is set to the old IP instead of the new one because it hasnât registered itâs new IP with the relays yet. This is why a machine that would otherwise get UDP notifications could miss them, and why command polling is a good fallback, even if it is ideal to try to ensure UDP is working.
I verified with our network admin that we definitely have udp port 52311 opened up at all 115 locations and the ACL is identical in regards to BES traffic. Which would explain why we have some clients that are working correctly.
For testing purposes, he actually opened up all network traffic wide open between the BES host and one of the offending clients, and this did not resolve the issue. The client still does receive the UDP refresh messages.
We donât have command polling enabled, but we will definitely consider turning it on for the reasons you mentioned, but in the mean time I am still stumped as to the source of the UDP notification problem.
The logs on the client donât show much (in my opinion).
Are there logs on the BES host that I can start looking through for clues? Let me know what you think, thanks.
Do you have Windows Firewall or another host-based firewall on these particular Relays that could be blocking it?
Are these Relays also doing something thatâs port-heavy like acting as a DNS server? If another service happens to grab UDP port 52311 before the BES Client service starts, the BES Client wonât be able to listen on that port. Check with something like Windows Resource Monitor, or Sysinternalsâ TcpView, or ânbtstat -anobâ to see if another process is listening on udp/52311.
There are actually a couple of fixlets in the BES Support Site to deal with this. They are written specifically to flag on DNS Servers, but really this could occur on any system; but is more likely to occur on a system that uses a large number of dynamically-allocated (âephemeralâ) ports (like DNS).
Fixlet 597 Reserve port for BES Client for DNS Servers
Task 765 Reserve port for BES Client for DNS Servers (Windows 2008 / 2008 R2)
I donât see the harm in adding the ReserverPort on every client installation (unless you consider one might run BigFix on an alternate port rather than 52311)
Thatâs good then. Iâm considering putting in an RFE to have the BES Client Installer itself put a reservedport into the registry, without a need to run the Fixlet at all. But that would compete with the RFEâs that I care more about.
The ReservedPort registry key even with hotfix KB2665809 did not resolve this issue in all of my servers. DNS was still grabbing port 52311 sometimes.
After doing some more research I found this key:
value âSocketPoolExcludedPortRangesâ of key âHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\DNS\Parametersâ of (native registry)
which DID finally fix my problem when I added 52311-52311 into that key. After doing that, the DNS service would correctly exclude that port from being used.