Relay go offline

Hi,

My issue is new installed Clients with Relay found disconnect from Top Level Relay, and only look for Main server after connecting to Top Level Relay for some hours/mins.

My architecture setting is, Main Server can only be reached by Top Level Relay, the other Relays and Clients can only reach Top Level Relays. So when Relays start to look for Main Server it will get fail

At 20:54:59 +0800 -
[ThreadTime:20:54:45] ShutdownListener
[ThreadTime:20:54:45] SetupListener success: IPV4/6
At 20:58:41 +0800 -
ActiveDirectory: Refreshed Computer Information - Domain: (N/A)
At 22:34:35 +0800 -
Error posting report to: ‘http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe’ (General transport failure.
SOCKET CONNECT (winsock error 4294967288)
At 23:06:37 +0800 -
Error posting report to: ‘http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe’ (General transport failure.
SOCKET CONNECT (winsock error 4294967288)
At 23:06:38 +0800 -
Beginning Relay Select
At 23:06:40 +0800 -
GetRelayInfo: checking 'http://127.0.0.1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=Version
GetRelayInfo: GetURL failed
Unrestricted mode
Configuring listener without wake-on-lan
At 23:06:44 +0800 -
[ThreadTime:23:06:40] ShutdownListener
[ThreadTime:23:06:40] SetupListener success: IPV4/6
RegisterOnce: Attempting secure registration with 'https://ROOT_FQDN:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=9.5.6.63&Body=15428630&SequenceNumber=45419&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&Root=http://ROOT_FQDN%3a52311&AdapterInfo=50-9a-4c-0b-03-7a_10.71.72.64%2f26_10.71.72.66_0&AdapterIpv6=50-9a-4c-0b-03-7a%5efe80%3a%3ac0ac%3a7f00%3ae928%3ab151%2f64_0’
At 23:06:45 +0800 -
RegisterOnce: GetURL failed - General transport failure. - BAD SERVERNAME (winsock error 4294967290 - registration url - http://ROOT_FQDN:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=9.5.6.63&Body=15428630&SequenceNumber=45419&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&Root=http://ROOT_FQDN%3a52311&AdapterInfo=50-9a-4c-0b-03-7a_10.71.72.64%2f26_10.71.72.66_0&AdapterIpv6=50-9a-4c-0b-03-7a%5efe80%3a%3ac0ac%3a7f00%3ae928%3ab151%2f64_0

The Relay will be online for some period again if the pc/relay service has restart, but soon they will go offline. Please advise what needed to be done to make these Relay stay online, or any doc i can look into?

million thanks,

Tony

One more thing is, only client with Relay found disconnecting, clients without Relay always connecting, please advise, thanks.

Tony

A system with a Relay on it should be configured for Manual Relay Select, not automatic select, so check that is the case.

You should configure the client setting for FailoverRelayList if the root server is not reachable. The setting should direct clients to the top-level relays. See https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli+Endpoint+Manager/page/Configuration+Settings for details.

2 Likes

Hi Jason,
thanks for the info, actually the relay setting of the relays (lower) are using “manual”, but it keeps failing to connect after some period of time.

also i will have a look on the link, thanks,

Tony

Hi Jason,
The Relay setting set in “Manual” for these Relays, can you please help explain why it would shutdown the service and look for main server??

many thanks,
Tony

hi Jason,
The situation is not improving after add configure FailoverRelayList, I input the FailoverRelayList setting of the Client go for the same Top Level Relay as there should be no disconnection (on network) to the Server.

I got the similar error as below:

Using localhost. Parent Relay selected: WTCCN-FS-BFLR05.aswgcn.asiapacific.aswgroup.net. at: 10.82.29.115:52311 on: IPV4 (Using setting IPV4ThenIPV6)
At 14:56:05 +0800 -
[ThreadTime:14:55:52] ShutdownListener
[ThreadTime:14:55:52] SetupListener success: IPV4/6
At 16:15:17 +0800 -
Report posted successfully
At 18:06:04 +0800 -
Error posting report to: ‘http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe’ (General transport failure.
SOCKET CONNECT (winsock error 4294967288)
At 19:58:16 +0800 -
Report posted successfully
At 20:34:19 +0800 -
Beginning Relay Select
At 20:34:20 +0800 -
GetRelayInfo: checking 'http://127.0.0.1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=Version
GetRelayInfo: Valid Relay
At 20:34:23 +0800 -
RegisterOnce: Attempting secure registration with 'https://127.0.0.1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=9.5.6.63&Body=16063849&SequenceNumber=43390&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&MinHops=3&MaxHops=3&Root=http://WTCCN-FS-BFDB.aswgcn.asiapacific.aswgroup.net%3A52311&AdapterInfo=f4-8e-38-bc-56-85_10.71.3.0%2F26_10.71.3.2_0&AdapterIpv6=f4-8e-38-bc-56-85^fe80%3A%3Acdeb%3Ada8e%3A6fe3%3A4246%2F64_0
Unrestricted mode
Configuring listener without wake-on-lan
Registered with url 'https://127.0.0.1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=9.5.6.63&Body=16063849&SequenceNumber=43390&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&MinHops=3&MaxHops=3&Root=http://WTCCN-FS-BFDB.aswgcn.asiapacific.aswgroup.net%3A52311&AdapterInfo=f4-8e-38-bc-56-85_10.71.3.0%2F26_10.71.3.2_0&AdapterIpv6=f4-8e-38-bc-56-85^fe80%3A%3Acdeb%3Ada8e%3A6fe3%3A4246%2F64_0
Registration Server version 9.5.6.63 , Relay version 9.5.6.63
Relay does not require authentication.
Client has an AuthenticationCertificate
Using localhost. Parent Relay selected: WTCCN-FS-BFLR05.aswgcn.asiapacific.aswgroup.net. at: 10.82.29.115:52311 on: IPV4 (Using setting IPV4ThenIPV6)
At 20:34:41 +0800 -
[ThreadTime:20:34:23] ShutdownListener
[ThreadTime:20:34:23] SetupListener success: IPV4/6
At 21:50:30 +0800 -
Report posted successfully
At 22:22:03 +0800 -
Error posting report to: ‘http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe’ (General transport failure.
SOCKET CONNECT (winsock error 4294967288)
At 22:46:07 +0800 -
FAILED to Synchronize - General transport failure. - SOCKET CONNECT (winsock error 4294967288 - gather url - http://127.0.0.1:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=http://sync.bigfix.com/cgi-bin/bfgather/assetdiscovery&Time=24Jun22:46:06&rand=25a72c97&ManyVersionSha1=da39a3ee5e6b4b0d3255bfef95601890afd80709
At 22:46:13 +0800 -
FAILED to Synchronize - General transport failure. - SOCKET CONNECT (winsock error 4294967288 - gather url - http://127.0.0.1:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=http://sync.bigfix.com/cgi-bin/bfgather/bessupport&Time=24Jun22:46:13&rand=26fea9de&ManyVersionSha1=da39a3ee5e6b4b0d3255bfef95601890afd80709

please advise,
Tony

You should open a PMR with IBM support to go into more detailed troubleshooting of your system. I do not have any more suggestions.

Hi Jason,
Many thanks anyway, as my Bigfix resaler didnt provide a channel to report a PMR can you provide a bit more info,

And can you advise if I can force a client relay to connect to a Top Level Relay with the configure FailoverRelayList?

thanks,

Tony

I would check below components first.

TLR = Top Level Relay

  1. Primary and Secondary relay Configuration. As relays should be configured with Manual relay selection.
  2. Failover Relay settings.
  3. Check If MaxChildCount is set on TLR, and TLR is not reaching that number. If TLR is reaching MaxChildCount value then it will reject any new connections coming in.
  4. Telnet to TLRs at port 52311 to check there is no issue with port connectivity.
  5. Ping test to TLRs to identify if there is any packet loss.
  6. Traceroute to TLR to identify if connectivity is dropping somewhere in between.
  7. If above all is absolutely fine and still Client Relays are trying to contact Main server, Enable Debug mode with the value 10000, to identify if you can find any error registering with the relay.
  8. If you don’t find any issues in above checks, better to use Wireshark to see if there is any instability in network connectivity to TLRs.

If none of the above gives any results, as Suggested by Jason, I would open a PMR. but one of the above checks should give you where the problem is.

Please let me know how it goes.

Hi Sraj,
many thanks i will try, many thanks.

Tony

Hi Sraj,

  1. confirmed with Manual
  2. Failover Relay Settings : configure to same TLR
  3. MaxChildCount setting is empty, as nobody knows the setting
  4. telnet is success with the port 52311
  5. TLR is configured not pingable
  6. tracert got stopped one level before reaching the TLR, will it cause problem??
  7. not sure where to enable debug mode, is the configure per every TLR as some of the TLR connecting a thousand endpoints? will it cause impact to client or just server/TLR??
  8. a bit hard to use for Wireshark as i m no network expert, any example u can provide so i can just copy to my env?

million thanks.
Tony

Yes, blocking icmp pings to the top-level relays can prevent child relays from selecting them (even with the child relays configured for manual relay select).
In this case, you should add the FailoverRelayList client setting, configured on the child relays, with values directing them to your top-level relays.

The client (including child relays) first attempt to “ping” potential parent relays to determine which are available. If none respond to ping requests, the client (or child relay) would attempt to contact the BES Root Server defined in the masthead (even without ping response). Defining the FailoverRelay or FailoverRelayList client setting overrides that behavior and the client/child relay will contact the relay(s) listed in this setting instead of connecting to the root server.

2 Likes

Hi Jason,
thanks for the info, so can I take your advise this way: if the TLR is set to pingable again, it might improve the client (& lower relay) disconnection issue?

Tony

Yes, that should improve things.

For initial registration, you’d still need a FailoverRelay set, or RelayServer1 / RelayServer2 at installation time (before the client has obtained the relay list). After initial registration, allowing icmp or setting FailoverRelayList should maintain relay select capability.

Hi Jason,
many thanks with the info, we will try, thanks again,

Tony

Glad I could help, hope it goes well with you

Hi Jason,
My colleagues have it tested, event the TLR is pingable and tracert they still easily go offline, further i looked into one of the disconnected relay client relay logfile in Program Files\BigFix Enterprise\BES Relay\ i found a lot “No buffer space”, is that the cause of the issue also? and how to tune up the buffer space??

**10.82.29.115 is the TLR at below log

Sat, 26 May 2018 22:44:58 +0800 - PeriodicTasks (1896) - GetExpectedVersionOfParent Error: HTTP Error 7: Couldn’t connect to server: Failed to connect to 10.82.29.115: No buffer space
Sat, 26 May 2018 22:44:58 +0800 - PeriodicTasks (1896) - Error running task UpdateAndSendRelayStatus: HTTP Error 7: Couldn’t connect to server: Failed to connect to 10.82.29.115: No buffer space
Sat, 26 May 2018 22:46:43 +0800 - /cgi-bin/bfenterprise/clientregister.exe (16492) - Uncaught exception in plugin ClientRegister with client 10.70.70.3: HTTP Error 7: Couldn’t connect to server: Failed to connect to 10.82.29.115: No buffer space
Sat, 26 May 2018 22:47:20 +0800 - /cgi-bin/bfenterprise/clientregister.exe (11948) - Uncaught exception in plugin ClientRegister with client 10.70.70.3: HTTP Error 7: Couldn’t connect to server: Failed to connect to 10.82.29.115: No buffer space
Sat, 26 May 2018 22:47:25 +0800 - /cgi-bin/bfenterprise/clientregister.exe (16444) - Uncaught exception in plugin ClientRegister with client 10.70.70.3: HTTP Error 7: Couldn’t connect to server: Failed to connect to 10.82.29.115: No buffer space
Sat, 26 May 2018 22:48:10 +0800 - /cgi-bin/bfenterprise/clientregister.exe (13452) - Uncaught exception in plugin ClientRegister with client 10.70.70.3: HTTP Error 7: Couldn’t connect to server: Failed to connect to 10.82.29.115: No buffer space

Million thanks.
Tony

Is the top level relay itself doing ok? Do you have a PMR open (you’ll probably need one).

If your top level relay is healthy and not giving error messages, I expect there may be something wrong in your network path or the network configuration on your child relay. Are you doing anything to restrict tcp/ip sockets (like defining a small ephemeral port range)?

Hi Jason,
i asked my reseller but they didnt provide me any channel to IBM Bigfix, how to submit a PMR actually can you give me some info?

i guess my TLR health are ok as not all 6 will go wrong at the same time right? I dont know if my network colleagues restrict anything as i am no expert to network also, can you suggest any command i can try to see the current setting?

million thanks.

Tony

You’ll need an IBM ID to log in and support PMRs (which I think have been renamed to TS now to be more confusing).

If you don’t have an IBM ID, you should be able to create one and register for support using your customer number or agreement number. If you don’t have those and your reseller is defunct or uncooperative the IBM licensing folks should be able to retrieve your customer number given the serial number in your masthead file.

Let us know your current standing so we can determine where best to direct you

1 Like