Computers dropping out of BigFix - VPN & DMZ

Hi,
We had been having problem with Client dropping out of BigFix randomly for no apparent reasons, particularly Asia Pacific regions have the most problem. Most, if not all, of the missing clients are connected to VPN and confirmed they have the BigFix Client installed and running.
The problem we are seeing is:
We have about 300 out of 6000 clients would disappear from BigFix, (not all at the time). But a small number of clients would show up again after a few days or a week. I had tried enabling Persistent Connection on both Relays and Clients and tried setting computer settings from best practices and other forum. Nothing seems to resolve the issue.
Because Asia Pacific (China has the bulk of the issue) has the biggest hit, Here is how China is setup:
Number of clients: 300+ in 3 cities
Number of Relays: 3 ( 1 in each city)
Relay Selection: Automatic
BESClient_RelaySelect_Max TTL was set at “10”, but we changed to “30” recently.
At sometime during the day, many clients seems to think a relay across the world is the closest, that was the reason we try changing the TTL to 30.

I’m hoping someone had experienced this issue and can help us to resolve this issue.

Many thanks!!

Sounds like you may have several things going on.

For computers disappearing from the console…a computer failing to find a relay or send reports, does not remove it from the console. You may have automatic computer removal set in the BESAdmin Tool (check the Maintenance tab). There are options to delete computers that haven’t reported in a number of days; check that your value, if set, is not too short.

If your clients are set to Automatic Relay Select, ensure they can Ping your relays. The client determines the distance to relays via ICMP Ping messages. It sends a series of pings to all the relays, starting with a TTL of 1, then increasing it until a relay responds or it reaches the MaxTTL value. If your clients are selecting a relay too far away, the MaxTTL may be too high or the ICMP traffic maybe blocked. Check that your VPN rules allow both ICMP and your BigFix port (tcp/52311 by default).

The relays that it attempts can be tuned with the Affiliation SeekList of the client, and the Affiliation AdvertisementList of the Relay. If you are setting values for either of these, ensure you are setting values for both the clients and the relays; else you may find the clients not even attempting to use the relays you want. Common errors there are to base the values on IP Subnets or Active Directory Sites, which may not resolve as you expect when on VPN.

If the client can’t find a relay at all, it will fallback to the root server (again, by default) without trying the ping; just sending the BigFix TCP traffic and hoping for the best. At this point it’s in ‘Failover Relay’ mode, tunable with the _BESClient_RelaySelect_FailoverRelay, _BESClient_RelaySelect_FailoverRelayList client settings, or the FailoverRelay defined via the BESAdmin Tool.

You can turn on debug logging at the client and perform a Relay Select to see how the client is behaving. Or you can engage a support ticket and they can help you through the process.

All these settings and many more described at https://help.hcltechsw.com/bigfix/9.5/platform/Platform/Config/r_client_set.html#r_client_set__arhd

1 Like

Thanks Jason.

Yes, I pretty sure we have multiple issues too :frowning:
Unfortunately, trying to get Info Security and Network team to troubleshoot this issue together is like pulling wisdom teeth.
We have had open a few tickets with IBM and now HCL on these issues and prompted to make some computer settings changes, but not really resolved the issue.

We do have scheduled computer removal set for 45 days.
ICMP is only open internally and is blocked from internet.
VPN rules is allowed port tcp/52311 and should allowed ICMP, but will doublecheck on the ICMP.

We do have 2 DMZ relays setup and had configured the “_BESClientSelect_FailoverRelay” and set the value to both DMZ relays separated with “; “ (does it matter if I have a space after the semi-colon?”

What is the difference between “_BESClientSelect_FailoverRelay” and “_BESClientSelect_FailoverRelayList” Do I need to add both settings?

Here is something very odd on clients able to connect to DMZ relays.
I was under the impression that if a client connected to VPN, they would hit one of the Internal Relays.
If not, they should hit 1 of the 2 DMZ relays
We noticed we have about even number (70-80) of clients connected to the DMZ relays
What’s odd is that, of All the clients connected to the DMZ relays, most are US computers and are using Fios (such as myself) and both of my company computers are Always pointing to 1 or the other DMZ relays.
. Those that are using Comcast (such as my boss) and other ISP are Not able to hit the DMZ relays.
Have anyone ever seen this issue?

You may want to try adding this setting also …

_BESClient_RelaySelect_AlwaysOnIPListChange=1

. That will force the client to reregister when its IP changes. They may help it be more resilient when moving between VPN and off VPN.

3 Likes

Thanks Jared. I added the the “_BESClient_RelaySelect_AlwaysOnIPListChange” = “1” to a list of computers that are having issue finding the closest relays and let them run for a few days. But it doesn’t seems to improve. We are using ATT Global Network client for VPN. I’m leaning toward the issue maybe on how ATT is routing the connection on computers on Fios vs Cable and DSL. Most if not all the computers working are on Fios. We had tried contacting ATT support, but don’t have a solution yet.

Remember, if the Agents are Automatic Relay Select, they also utilise the UDP Protocol. You may have to open ports for that too, which may explain why they are partially working…

Also, what are you seeing in the client log on those machines? Are they registering? Or are you seeing the registration fail? You may want to try to sync that up with VPN connection logs.

1 Like

I believe _BESClientSelect_FailoverRelay can only be a single relay, not a list, so it sounds like you have that set incorrectly.

This is the one that allows a list of failover relays to be tried in succession: _BESClientSelect_FailoverRelayList

You might want your failover list to start with a VPN relay, followed by a DMZ relay.

This setting won’t help with your primary issue, but it will allow clients that are currently talking to a DMZ relay to switch to a VPN relay when the VPN is connected, so it is a good idea for all clients using DMZ or VPN relays.

You really want both TCP & UDP over 52311 and ICMP to be allowed over the VPN. You technically don’t need ICMP to work over VPN for the failover relay settings to work, since those don’t rely on ICMP. It does seem like this might be your issue, that ICMP isn’t working to the VPN relays.

What is the c_code of the clients after they disappear?

I’ve had an ongoing issue on AIX systems where the C_Code gets changed to “DED” in the besclient.config. Once that happens, I lose sight of it.

None of the client settings suggested by support helped. Pretty sure it was either related to network timeouts or something filling up /var/ temporarily.

I’ve recently made /var/opt/BESClient a separate logical volume on the chronic “DED” endpoints, this seems to have worked.

This is on VPN and in network… both UDP and TCP are opened, ICMP is blocked from the Internet. Do we need to open ICMP to the 2 Internet facing relays? I have a hard time explaining to our Info Security tech.

No registration failure in the latest client log. but as previous stated, the TTL maximum set to “30” hops.

Thanks for the explanation on the 2 “_BESclientSelect…” settings, I’ll change it on a few computers having the issue and see.

Question on the “VPN relay”. Do we need to setup addition Relays on the various IP ranges assigned by the VPN connections?

from what our Info Sec, TCP, UDP, and ICMP are opened over 52311 on VPN

1 Like

Where do I find the c_code, if any, on the Windows systems? We only have servers that are on Linux, and they are not having the issue. Only the 2 DMZ relays are install on Linux. All other internal relays are installed either on windows 2012-2019 or Win10.

Sorry, I’m a little confused about the order of the relay needs to be set. Is this correct?

“_BESClient_RelaySelect_AlwaysOnIPListChange”=“1"
”_BESClient_RelaySelect_FailoverRelay"=“Interal Relay”
"_BESClient_RelaySelect_FailoverRelayList"=“Local_Relay;DMZ_Relay”

What does the Client Logfile say when they disconnect? Also check the Relay log

On unix the c_code is in besclient.config, on Windows I think it is a registry setting.
Your issue sounds more like network. Are any of your linux endpoints in the DMZ subnet? Do they have the same issue?

For the internet facing relays, I would open ICMP unless you are using failover settings exclusively for this.

The relays (as a whole) need to at least accept connections from all of the IP ranges of the VPNs. You might want additional relays for each IP range just for things like ease of UDP and wake on lan, but it shouldn’t be strictly required. The biggest reason to have more relays is to make the network more efficient by having the relays be closer to the clients in the network topology but if everything is in the same datacenter, then that isn’t a major factor.

That should mean the VPN relays would work with automatic relay selection.

I don’t know what that is, I wouldn’t worry about it for windows systems.

These should be set to the FQDN of the relay, or the short name of the relay, or the IP of the relay. You can just use the _BESClient_RelaySelect_FailoverRelayList no need to use both.