RHEL relay goes deaf

I have four RHEL relays, but on one of them (also a TRC server/broker/gateway) the BESClient goes ‘offline’ every now and then. It’s still running, it hasn’t crashed (that I can tell), but it reads as offline. The BESClient log shows simply no action in several hours; nothing in var/log/messages.log. Sending a refresh from the BigFix console doesn’t ‘wake it up’.

Any ideas? doing ‘service BESClient restart’ fixes it for a while, but obviously that’s not what I want to be doing. (And of course, the other three RHEL releays are champs.)

Thanks,
Andrew

Does sending a blank action do anything?

Did you recently update the RHEL release on any of these?

We patch and update regularly, on all at a go. They’re currently on RHEL 6.7.

After fiddling a bit, I found that they stay up (so far) if I start the client, then wait a while before starting the relay. So far it’s staying up. I wonder if this might be due to the agent and relay starting up too close together.

I meant did you just go from something like RHEL 5 to RHEL 6 or something like that.

I wonder whether anyone can reference a doc on the interaction between the client service and the relay service on a relay. I.e. does the client service normally loopback to the local relay service, and maybe you are now forcing the client to select an upstream relay by delaying the relay service startup?

1 Like

You should also be looking at /var/opt/BESClient/__BESData/__Global/Logs
And the logfile.txt of the Relay data folder

1 Like

The connection between relays would be TCP and would be separate from client traffic. Relays talk to each other via TCP and only reply to clients via UDP since clients don’t listen on TCP.

Unless there is a resource issue, starting the services close together shouldn’t have much of an impact and I don’t think would impact the evaluation cycle of the client if it is, in fact, completing evaluations. The “average duration of evaluationcycle of client” analysis comes to mind here.

Do you check for this and what does a client indicate, after a couple of reports, say it is?

@JasonWalker might be on to something.

I believe the local client on the relay talks to the relay service through TCP so if the client can’t talk to localhost over TCP 52311 then it won’t work.

If you start the client first, then the relay, it might be that the client on the relay would talk to the parent relay instead of the localhost relay.

Ah! No, that’s not the case. These have always been on 6.something, although I forget where they started. They’re on 6.7 now.

I’m wondered that too. I haven’t found anything about the theory of operation for a relay’s own client. The documentation states that a relay must have the agent installed, but no other details.

At the time of my last post (3 hours ago as I type this), I shut down both and waited for all connections to close. Then I started besrelay, waited for it to be happy, then started besclient. So far, besclient is happy and talking to the relay on 127.0.0.1:52311.

Looking at their init scripts, besclient doesn’t have any dependency on besrelay. Seems to me it should check to see if besrelay is installed and enabled, and if so, wait until besrelay is started. (Or, go ahead and start besrelay first.)

1 Like

On windows, the besclient is set to delay autostart, so that kind of builds in a delay.

More and more I’m thinking of shoving a sleep statement into /etc/init.d/besclient.

Ya know, I put the relays on linux to avoid unnecessary chicanery. Harrumph.

Client debug logging enabled…

Windows has been the primary platform for a very long time.

As the user group yesterday, @steve said he’d like to talk to folks about Linux relays. In a personal followup, he offhandedly remembered that in some cases they had needed specific settings – maybe kernel parameters, he wasn’t sure.

I just checked and – yup – my relay’s client is still exhibiting this bizarre behavior.

1 Like

@steve is definitely correct that there are good reasons why Linux would be able to handle many more clients than windows just due to the OSes themselves, but on top of that Linux should be able to do so with less resources.

I have played around with CentOS relays a bit, but I have never used them in production. It is definitely worthwhile to investigate, but it doesn’t seem like all the kinks have been worked out at this point.

I’ve had applications in the past that needed specific kernel parameters on Linux, accounting for open files, semaphores, and the like. I wouldn’t be surprised if that is the case here.