Odd BESClient behavior on laptops

I’ve seen this happen twice so far.

Windows Laptops with the BESClient installed. The local Firewall is NOT blocking UDP traffic, but the clients are not seeing the UDP messages from the Relays or Server and are failing to begin processing Actions targeted at them.

In the most recent case, I reached out to the laptop and restarted the BESClient service, it started processing the action and it now stuck on Pending Downloads, presumably because it can’t see the UDP message from it’s Relay telling it that the Downloads are now available.

On the first case that I saw this, if the client was Uninstalled then Reinstalled it started working properly

Has anyone else seen this before? Or does anyone have any suggestions? (short of Command Polling).

Related:


I wonder if the laptop IP address changed through DHCP, which would then cause the relay to be sending the UDP notifications to the wrong IP until the client re-registered. I believe the client is supposed to handle IP changes and re-register, but I’m not sure how quickly that happens.

If possible, you should check what the client’s IP was at the time as well as what the relay thought the IP was for that client. If this happens again, you could also shut down the BES Client service and then start something to monitor for UDP on the BES Client port and see if the UDP packets are actually coming through or not.

This seems less likely, but perhaps a firewall is letting UDP in on the same port it is seeing TCP go out from the client for a time, but if it does not see TCP going out for long enough, then it starts to reject the UDP coming in for the client.

You might consider lowering the client registration interval, particularly for mobile clients.


It is still a good idea to investigate what is happening with the UDP notifications, but I would recommend the following regardless:

I would strongly recommend reducing the Download RetryMinutes to 1, which used to be the default, but was changed to 10.

I would recommend enabling command polling to be at least once every 24 hours for all clients, and more aggressive for mobile clients.

To get the client to respond to the Action at all, I stopped and restarted the client service. This should have caused the client to re-register with its relay. In fact when I look at the Registartion IP Address of the client, it matches the current IP of the device.

I’ll see if I can get the owner of the laptop to surrender it for a bit of hands on testing.

1 Like

I’m quite sure the IPs matched after the client was restarted, but I’m wondering if they did not match before the client was restarted.

The issue is if the UDP notifications were sent while the client had the wrong IP, they aren’t sent again when the client re-registers. The relay has no way to know if the client got it or not.

If a client system is sleeping, it will miss all UDP notifications. If it wakes up, it won’t suddenly get the notifications it missed unless it polls for commands. This is why I recommend ALL clients have command polling enabled for at least once every 24 hours, even desktops on the LAN. I believe we have all of our clients set to use command polling at least once every 12 hours and I’d like to try lowering that even further to something like 3 hours.

According to IBM, Command Polling primarily causes extra load on the Relays and not the Root Server itself, which makes sense. At least in our environment, load on the relays is not as much of a concern as load on the Root Server, although our Root Server doesn’t seem to have any issues except for our number of simultaneous console sessions.

@jgstew, I have your “BES Client Info - Universal” Analysis imported and enabled for laptops (including this one). According to the “Last Command Time (UDP)” property, this machine has NEVER received a UDP message. At least, it lists a value of (none) for the property.

I sent another Action for the Utility I’ve been trying to test with this laptop. The BESClient never saw the UDP message telling it to come look for new Actions so I restarted the client again. The Relay already had the file the client needed and it was able to download it and run it.

I have yet to reach the owner of the laptop to see if I can get my hands on it for a while.

1 Like

This makes it sound like a network/firewall issue. It seems this client never receives UDP packets. Restarting the client causes something similar to a one time command polling, which is why things start happening at that point.

  • Does the client have a public or private IP?
  • Is the client behind a hardware firewall?
  • Is the client behind a NAT?
  • Is there an OS or other software firewall blocking the incoming UDP
    packets?
  • Is there some other software conflicting with the BES Client by
    listening for UDP on the same port?
  • Does the client have a public or private IP?
  • The client has an IP address from a large IP Subnet used by our Wireless network. Other devices in the same Wireless environment are working fine.
  • Is the client behind a hardware firewall?
  • No, see above regarding Wireless connectivity.
  • Is the client behind a NAT?
  • No, other devices in the Wireless subnet are working fine.
  • Is there an OS or other software firewall blocking the incoming UDP packets?
  • I don’t think so. There is no relevant content regarding BES Traffic being blocked
  • Is there some other software conflicting with the BES Client by listening for UDP on the same port?
  • Unknown at this point.

Interestingly, when I let your Analysis report against all the Laptops, there are ~450 that have < none > listed for their Last Command Time (UDP).

For diagnostic purposes, I’ve copied the Relevance from the Last Command Time (UDP) property into a new Analysis where the property will refresh every 30 minutes. If I understand the property correctly, there really should be almost NO clients with < none > as their value for the property unless they are brand new client installs.

Is that accurate?

If my understanding is correct, then (last command times of client) should have a value other than < none > as long as the client has received a UDP notification at some point, which should happen very regularly, unless all operators with management rights never use the console to do much of anything.

In this post, @AlanM suggests that the relevance will return none if the client is restarted, but then suggests that is not the case and that it should behave the way I described above.

This is meant to be taken as a policy action to Automatically enable Command Polling on clients that have not received UDP commands/notifications.

You can check the logs for UDP commands: http://bigfix.me/analysis/details/2994616

number of lines whose(it contains "GatherHashMV command received" OR it contains "DownloadPing command received" OR it contains "ForceRefresh command received") of files whose((name of it ends with ".log" OR name of it ends with ".bkg") AND exists lines of it) of folder "__BESData\__Global\Logs" of parent folder of client

Interesting, I added a Property to count the number of UDP Commands (30 minute refresh) to the Analysis, and there are machines with < none > AND with non-Zero counts for UDP Commands in the Log Files. I also have a property that calculates the Client Age

now - subscribe time of current site

and I have machines where the client has been installed for Years (current max is 826 days), but still report < none > for the Last Command Times of Client.

Oh, and the problem is not limited to Laptops, I only noticed it first on a pair of laptops.

1 Like

What is the version of the client? is it older than 8.2?

Maybe last command time used to get reset after every reboot, but now in a later version of the client it does not?

You can just send a force refresh which will send a UDP command and see if it is updated.

Use this instead for Client Age:

now - minimum of subscribe times of sites

I’d also recommend the following for all clients: (30 or higher)

_BESClient_Log_Days=30

http://bigfix.me/fixlet/details/3913

The clients are all 9.0.876.0

I’ll update the Client Age property and see what effect it has.

In most cases, it shouldn’t have any affect, but getting the minimum subscribe time of all sites is more accurate of what you are looking for.

Compared the two clauses on my desktop, and the difference was about 2 hours.

1 Like

The __Global directory has the file __EMSerialization which has the serialization of the last command time when the client quits.

What would the implications be of the _EMSerialization file being “blank”/Empty?
I have 16 systems where their _EMSerialization file has no lines in it. Some of these clients are Windows 2003 servers that are over 800 days old. Apologies to Microsoft, but I find it hard to believe that they have not been rebooted in over 800 days! I could believe it with Linux, but not Win2003!

The laptop that started all this off for me, HAS a value in it’s __EMSerialization file …

Thu, 01 Jan 1970 00:00:00 +0000;Fri, 08 May 2015 12:03:00 +0000;

But obviously, IEM wasn’t in use in 1970! Any idea what might cause this? I have 1099 clients out of 31k showing the first date in the pair as being Thursday January 01,1970 (epoch?) I’m trying to determine why some of our devices will not respond to, or even log that they received the UDP message telling them that there is an Action waiting for them. Could this have anything to do with it?

I also currently have 59 machines returning a value of “0” for the the following relevance

number of lines whose(it contains “GatherHashMV command received” OR it contains “DownloadPing command received” OR it contains “ForceRefresh command received”) of files whose((name of it ends with “.log” OR name of it ends with “.bkg”) AND exists lines of it) of folder “__BESData__Global\Logs” of parent folder of client

We keep our logs for the default of 10 days, so it would imply that these machines have not received a GatherHashMV, DownloadPing, or ForceRefresh command in the last 10 days. And I know I was sending out dynamically targeted actions yesterday, so there should be SOMETHING listed in these logs about it.

1 Like

Yes, this implies that they have not had any of those command received’s in the past 10 days.


Q: /* last command time & last report time */ (it as time) of (it as trimmed string) whose(""!=it) of substrings separated by ";" of lines containing ";" of files "__EMSerialization" of folders "__Global" of data folders of clients
A: Tue, 14 Jun 2016 11:59:18 -0700
A: Tue, 14 Jun 2016 12:04:44 -0700

The file is a serialization of an internal set of structures and represents 2 values. The “0” time is the 1970 time (PC’s have that as their epoch) so that means “never” basically

The first entry is your last command time the second is your last report time

1 Like