Relay Autoselect issues

Greetings all,

Odd issue:

I don’t believe relay auto select to be functional at all at the moment, and I think it may have to do with the 52311 port redirecting to https…

So, running the client diags with the -r flag gives me results:

DNS name IP Address Hops RelayVersion

IP IP 6 Not Reached //the IPs are correct I set them with name over ride to test.
correct name IP 5 Not Reached // all good except the not reached
correct name IP 3 Here is where it gets really odd: Not Reached Get Request Failed (http://DNSNAME:52311/cgi-bin/bfenterprise/clientregister.exe?requesttype=version) : 403 Forbidden.
correct name IP 2 Not Reached Get Request Failed (http://DNSNAME:52311/cgi-bin/bfenterprise/clientregister.exe?requesttype=version) : 500 Can’t connect to DNSNAME:52311 (connect: Unknown error).

Those are the 4 types of errors…

Then down further:
Based on these results, BES Client should have chosen (hopcount 1):relay1url or relay1url
BES Client last chose: relay2url

What caused me to start looking at this? I have machines talking to the main BES server and I was trying to move them. I have setup relay affiliation as well as we have some relays that should be used more that are not due to being more hops.

Couple of other things to add…
The relay affiliation seems to work, a small percentage of machines actually do move, very small though.
What kind of tracert dependence is there? I have some that doing a tracert from client to relay comes through clean, others… there are some time outs at certain hops. Is this a problem?
The main BES server has these set:

_WebReports_HTTPServer_HostName http://CORRECT NAME:52311
_WebReports_HTTPRedirect_PortNumber   Blank
_WebUI_Redirect_Enable 1 
_BESRelay_HTTPServer_PortNumber 52311

See why I think traffic on 52311 is going to https?.. more

notice above: the url for the register service…
http://RELAY:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=Version works…
http://RELAY:52311/cgi-bin/bfenterprise/clientregister.exe?requesttype=version does not…
on the main server both get redirected to https, and fail a cert check…

What else to try? Thoughts?

THANK YOU ALL, relatively new BigFix admin… So much to learn especially since it is inherited. The previous admin is great and works well with me still! Thanks Pete!

Edit to add, the main server has the autoselectable flag set to 0 and is in its own affiliation group that none of the clients are set to.

A couple of other choice lines from the diags:

First, what version of the platform are you talking about?

Second, we use 52311 for both HTTP and HTTPS in BigFix inter component communication so the port number has no indication of the type of connection

Pay attention to the following lines in the client logs.

This example shows that the connection was made with HTTPS or it wouldn’t say https here.

Registered with url 'https://r<eglayname>:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60...

This example shows you what the client selected as the relay (might be different if it has a local relay or server)

Relay selected: <relayname> at: x.x.x.x:52311 on: IPV4 (Using setting IPV4ThenIPV6)

I’m sorry, 9.5.7
correct and that is the problem with the port, our defualt port appears to be redirrecting EVERYTHING to https.
RegisterOnce: Attempting secure registration with 'https:
so yes, using https

but why is relay selection failing? The not reached reported but the client diags -r? That’s where the issue is, the algorithm is not considering https or it is using a url with the difference in capitalization?

the bottom line is appears that relay selection knows the hop count but fails the connect test thus ignoring the hop count.

Ok, I think I cleared up some of the issues I was having… but there is one still remaining and I can’t imagine I’m the only one. Auto select process for many machines I think is just not working. I would love for someone to try the following URLs and report back:

http://RELAY:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=Version
http://RELAY:52311/cgi-bin/bfenterprise/clientregister.exe?requesttype=version

The only difference is the capitalization of the last part. Why does this cause autoselect to fail? It appears that autoslect uses the lower case, not the upper case. I base this on the fresh client diags run from the fixlet task. and this is the url that is reported during that test as having some issues. The capitalized version is what works and similar to what is reported in the relay logs but it appears that autoselect is using the lowercase versions. Can anybody verify the functionality of the two urls on their instance?

I would really love to find out if the relay links listed work on someone’s instance. The difference in capitalization is what is making this an issue lol.

You can always try these in a browser. Here’s my result in Chrome:
http://RELAY.FQDN:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=Version
ClientRegister
Version 9.5.7.94

http://RELAY.FQDN:52311/cgi-bin/bfenterprise/clientregister.exe?requesttype=version
Error: no query parameters specified – unknown request type

Thanks for that Jason! So why I think that matters is if you look at the autoconfig part of the log, I see failures that are using the lowercase version, the one that does not work. all the logs dont have the url listed so there is another failure there but the all report not reached. Which, if the url does not return the version number then the auto config thinks it cant find the relay.

how to see run this with -r switch: http://software.bigfix.com/download/bes/util/BESClientDiagnostics-9.5.3.exe
(someone really needs to update the utilities pageP
go down to end of the screen output… you will see:
======= Relay Selection ===========

Wait time out for tracert is: 100. Maximum num of hops is: 20.

  • BES Client autoselection value - 1

  • Parsing Relays.dat.

  • Number of Relays Found:28

  • Retrieving hop count and version information for each relay. (this may take a while…)

    DNS name IP Address Num Hops RelayVersion

I get loge entries with IPs in both dns and ip adders fields, this is due to name override i put in place to test, num of hops is reported correctly but RelayVersion, which would come from the url we are discussion just says NOT Reached for everything. Then I get an entry like…

Get Request Failed (http://Relay:52311/cgi-bin/bfenterprise/clientregister.exe?requesttype=version) : 403 Forbidden.

relayFQDN	Correct IP		6		Not Reached

notice the faulty capitalization on the url don’t care about the 403 error here not what I’m talking about talking about the url never returning the relay version number.

then you get to the bottom, after EVERY relay can’t be reached and:
Based on these results, BES Client should have chosen (hopcount 2):
RelayAFQDN or RelayBFQND
BES Client last chose: RelayCFQDN
— WARNING: BES Client has incorrectly chosen a BES Relay.

Would someone mind to run the diags with the -r and see if the relay version column populates?

oh, yes, did turn off firewall for this test. and this is the same result on every machine I have tried it on.

one of the things I have struggled with is load balancing the relays and machines talking directly to the main server despite every effort to boot them. it all came down to manipulating the voodoo of autoselection but if the underlying test, weather the register version type url used to verify connectivity is or has been broken then sooooooo much time wasted.

I do have a support ticket in but getting the usual, run this diag, do it with multiple machines, do this, that and the other on this one computer. The issue is I can’t find a single machine that this IS working on. Not a one off machine having a problem here or there.

I also can’t get an answer to is this or that the url it uses and is a not reached in the last column causing the auto select to completely fail like I suspect. Everything in the documentation says a valid way to test if a relay is working and connectivity to it is find is by going to the register version type url. But I see both cases for that url so this is either new behavior or we need it stated what the real url to use is.

Maybe a silly question, but is ICMP open from the clients to the relays? The client will first try to “ping” each Relay to determine whether its reachable, and if the relay doesn’t respond to ping the client won’t attempt to hit it with http/https on 52311.
In addition to hitting the register URL, try pinging it.

not a silly question, yes, ICMP is for sure open, v6 and v4. I standardized the config of the firewalls on the relays. just checked, yes ping’able.

have you run the client diagnostics with the -r switch on your instance? I wonder if the report looks anything like the one I am seeing.

to add, I ran the test with the firewall OFF on the local machine, and same issue but like I said, I do not think this is specific to one machine. I don’t know that I have ever seen a relay selection report that worked.

I do have a support ticket. It has gotten to the point that they stopped asking for logs and I have not had a request for further info which is usually the point that the next comment will have some better info.

ok, got an answer from IBM. That link, the requesttype=version one plays no role what so ever in the autoselection process. The diagnostics -r, don’t realy diagnose anything. the only info provided by them is hop count and parsing of relay.dat… it is actually ignoring the relay affiliation lists. in my mind then no reason to even bother with that tool.

The only way to really understand what is going on is to bump logging up to max, run the diagnostic fixlet -r, trigger auto select, then gather the logs and look there. Awesome.

Who knows what the Relay Version column was for, but it doesn’t matter as it is not part of the algortm.

basically the real one works like this:
parses relay.dat.
looks at affiliation list.
does a tracert with TTL plus 1 on each of the relays it finds in the first affiliation group applicable to that client until it hits one.
When it does it then does a register, NOT a version request.
If register is successful then done.

I did not see a ping dependence there… but who knows what other undocumented goodies arise.

Anyway… the -r was a rabbit hole that is not worth exploring… if anybody else ever uses it for anything other than hop count, now we know.

I might’ve said ping, but meant it as shorthand for ICMP. The tracert part of selection uses ICMP.