Weird Client behavior in very unique environment - need some eyeballs

mxc0bbn · February 18, 2016, 1:38pm

The environment I’m currently working with has multiple firewalls:

For simplicity’s sake I’ll just define two zones with a firewall between them (Zone 1 and Zone 2)

Zone 1 contains:
Bigfix Server, several clients

Zone 2 contains:
BES Relay, several clients

There are firewalls between the zones which only permit traffic on port 52311 to pass between the relay server and the Root server, which means that if the clients in Zone 2 can’t talk to the relay in that zone they won’t be able to “go home” to the BES Server.

So I’m aware of the problems that this presents. I can’t speak to the reasons for its design as I was not involved in the original architectural discussion. My question is really around a strange behavior that RHEL clients are exhibiting:

Several RHEL clients that are in “Zone 2” have lost connection to the BES Server (even though the relay in that zone is up) but only AFTER being rebooted.

The RHEL boxes that have not been rebooted are reporting in just fine. They are all set to “manual” relay selection.

When I look in the client log, I can see that they are trying to register back with the BES Server and are ignoring the RelayServer setting “besclient.config” file.

So what am I missing here?

Thanks,
Mike

LawrenceG · February 19, 2016, 1:01pm

Hi Mike,

Can you post the content of your besclient.config file from one of your clients here please?

Is there any error in the client log? Like general transport failure (winsock error -6)?

Thanks,
Lawrence

mtrain · February 19, 2016, 1:19pm

@mxc0bbn … I have seen something close, but not quite the same. (See Issue Setting Relay for a Linux Client on Initial Installation). In my case, I defined the relay in besclient.config by IP address, not by hostname (thereby negating any need for IP hostname resolution). After that, I had no problems, even between system restarts. Maybe give that a try in your situation.

Of course, that means that when I look at the BigFix console, I might see the same relay twice - once for those clients who connected to it by IP hostname and once for clients who connected to it by IP address … but I can live with that.

–Mark

LawrenceG · February 19, 2016, 5:03pm

Right. I had the same issue on where the besclient.config file on a RedHat machine keeps on reverting the changes we made after the install. That is when we are using FQDN in the Relay config. But after leveraging IP address instead of FQDN of the Relay, it started to work.

One of the reason behind it in our case is that the RHEL client cannot resolve the DNS of the defined Relay in the config file that’s why it made its way directly to the Root(this is by design).

mtrain · February 19, 2016, 5:35pm

… and in my case, even defining the IP hostname in the hosts file on the RHEL client didn’t make any difference.

–Mark

gearoid · February 19, 2016, 7:39pm

I’d start by making sure that 52311 is definitely open and that you can actually connect from the client machines to the relay.
See can you reach the relay on 52311.

Your agent may not be ignoring the configuration settings - rather it is trying to use it but it cannot connect, in this situation an agent will revert to using the root server’s information.

AlanM · February 19, 2016, 11:48pm

The clients can decide they should go to the root server for a few conditions and this won’t help in your scenario. Remember the client needs to be able to ping the relay for it to consider it valid.

If you have control over the DNS in “zone 2” then you can make a DNS alias to one of your relays for the root server.

If you can’t or don’t want to do that you can apply this setting which the clients will try prior to going to the root server

_BESClient_RelaySelect_FailoverRelayList

Type: String 
Version: 9.0 
Platform: All 
Default: 
Requires Client Restart: NO 
Description: A semicolon delimited list of failover relay names used if configured and nobody is responding to pings. If present and not empty, it replaces _BESClient_RelaySelect_FailoverRelay.

mxc0bbn · February 20, 2016, 5:38pm

Sorry about the delay in replying…Getting settled at Interconnect

So overall, I think this might be a non-bigfix issue, but there’s still some weirdness going on. I’ll explain:

This was occuring on various RHEL servers on customer site; however, after doing every bit of troubleshooting I could think of I decided to take notes of config files and logs on a couple of RHEL servers they had not yet rebooted that were still communicating…

What I found was that those servers did not lose connection to the BES Server even after rebooting even though they were initially configured in much the same way as the ones who are losing connection…so, in essence, this can’t be a BESClient issue because it would happen on every RHEL server in that region if it were.

The reason I say that there is still some weirdness is that I take one of the non-communicating servers and test ICMP, TCP and even telnet to the relay in the region from it on port 52311 and all those work…but the damn server still refuses to try to connect to the relay and instead wants to go straight back to the BES Server.

Now, I know that initially upon installation a client will want to communicate with Root Server to get actionsite, relays.dat, blah blah, but…if it’s already been talking to its relay for quite a while and has all those…why do they still want to go back to the Root even though they have a relay right there in that zone?

I used a dirty trick to get the clients to communicate and be able to receive content, but it’s not a solution, just a work around.

I made an entry in the host file that points to the relay server, but has the name of the root server…And of course it worked…the client registered with what it thinks is the root server and it communicates perfectly.

In reading some of your responses I think I’ve covered some of your questions and even came up with the same workaround that one of you suggested

So, bottom line, I might have done a lot of work for nothing since it might turn out to be some other configuration change they made on those RHEL boxes that is causing this (Not SELinux…already verified it’s disabled).

Mike

JasonWalker · February 21, 2016, 2:07am

I’ve seen similar in the past, I think it may be related to blocked ICMP or a missing our incomplete relays.dat on the client.

May be able to avoid the hosts file dirty trick by specifying FailoverRelay or FailoverRelayList - client should use these as a last resort instead of Root server, regardless of ICMP/Ping response.

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Endpoint%20Manager/page/Configuration%20Settings

mtrain · February 23, 2016, 9:06pm

So to make sure I understand this … if I have port 52311 open in this scenario but not ICMP … the client will not be able to connect to the relay because it can’t ping it … even if relay selection type is manual?

–Mark

mxc0bbn · March 2, 2016, 2:28am

Thanks for the replies all. After some more testing several other servers that were rebooted did not exhibit this behavior so it looks as if it’s not related to the BES Client at all (maybe)…so I’ll let them figure out what’s different about those servers before I keep pounding my head

jgstew · March 3, 2016, 11:41pm

I don’t think ping is required for manual relay selection, but I’m not certain.