I just want to come upo with this again. The problem is still unsolved, but I could observe a strange behaviour with multiple relays.
Here are some logs from the logfile.txt of the relay:
Mon, 15 Jan 2024 14:23:55 -0300 - 3992 - Failed to create plugin thread for client 999.100.1.100: class WindowsPlatform::ThreadError
Mon, 15 Jan 2024 14:23:58 -0300 - 3992 - Failed to create plugin thread for client 999.100.1.101: class WindowsPlatform::ThreadError
Mon, 15 Jan 2024 14:23:59 -0300 - 3992 - Failed to create plugin thread for client 999.100.1.102: class WindowsPlatform::ThreadError
Mon, 15 Jan 2024 14:23:59 -0300 - 3992 - Failed to create plugin thread for client 999.100.1.103: class WindowsPlatform::ThreadError
Mon, 15 Jan 2024 14:24:01 -0300 - 3992 - Failed to create plugin thread for client 999.100.1.104: class WindowsPlatform::ThreadError
Mon, 15 Jan 2024 14:24:02 -0300 - 3992 - Failed to create plugin thread for client 999.100.1.105: class WindowsPlatform::ThreadError
Mon, 15 Jan 2024 14:24:14 -0300 - 4868 - PostResultsForwarder ( CURL Error 6 ): HTTP Error 6: Couldn't resolve host name: getaddrinfo() thread failed to start
Mon, 15 Jan 2024 14:24:15 -0300 - 6928 - 23: GetURL failure on http://rootserver.domain.com:52311/bfmirror/bfsites/manydirlists_1/__fullsite_3a85d217366a374a5acab7e0361b6e8: bad allocation
Mon, 15 Jan 2024 14:24:19 -0300 - 1848 - 23: GetURL failure on http://rootserver.domain.com:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/actionsite&ManyVersionSHA1=84a7a5a5f98ac948cca74f93&ExpectedManyVersionCRC=2883422624&Time=1705339459: bad allocation
Mon, 15 Jan 2024 14:24:19 -0300 - 3992 - Failed to create plugin thread for client 999.100.1.110: class WindowsPlatform::ThreadError
Mon, 15 Jan 2024 14:24:22 -0300 - 1976 - 23: GetURL failure on http://rootserver.domain.com:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/actionsite&ManyVersionSHA1=cb3aaa7a5a5f98ab3048cca74f93&ExpectedManyVersionCRC=2883422624&Time=1705339462: bad allocation
Mon, 15 Jan 2024 14:24:23 -0300 - 4868 - PostResultsForwarder: bad allocation
Mon, 15 Jan 2024 14:24:24 -0300 - /cgi-bin/bfenterprise/clientregister.exe (2916) - Uncaught exception in plugin ClientRegister with client 999.100.1.111: bad allocation
Mon, 15 Jan 2024 14:24:24 -0300 - /cgi-bin/bfenterprise/clientregister.exe (3348) - Uncaught exception in plugin ClientRegister with client 999.100.1.112: bad allocation
Mon, 15 Jan 2024 14:24:25 -0300 - 6864 - 23: GetURL failure on http://rootserver.domain.com:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/actionsite&ManyVersionSHA1=8b8cb3aaa5f98ab30b5b948c74f93&ExpectedManyVersionCRC=2883422624&Time=1705339465: bad allocation
Mon, 15 Jan 2024 14:24:34 -0300 - 4868 - PostResultsForwarder: bad allocation
Now, the behaviour is the following. The clients choose their specified relay, after a while, it could be 1,3 or 5 days (but it’s a sure thing), they suddenly stop communicating with that relay and I see these strange logs and that particular relay stops reporting to the server as well (by looking at last report time) and there’s nothing we can do to awake it from the console.
When i access the relay I see that the BigFix Relay Service and the BESClient Service is running. So I find it intriguing.
On the client logs of the relay I see the following:
********************************************
Current Date: January 15, 2024
Client version 10.0.9.21 built for WINVER 6.0 i386 running on WINVER 10.0.20348 x86_64
Current Balance Settings: Use CPU: True Entitlement: 0 WorkIdle: 10 SleepIdle: 480
IP Address 0: 172.31.1.96
Host name: REDACTED
Computer ID: 9999999999
Executable Location: C:\Program Files (x86)\BigFix Enterprise\BES Client\BESClient.exe
File Log Location: C:\Program Files (x86)\BigFix Enterprise\BES Client\__BESData\__Global\Logs
ICU 54.2 init status: SUCCESS
Agent internal character set: UTF-8
ICU report character set: UTF-8 - Transcoding Disabled
ICU fxf character set: windows-1252 (Latin 1 / Western European) - Transcoding Enabled
ICU local character set: windows-1252 (Latin 1 / Western European) - Transcoding Enabled
********************************************
At 14:10:04 -0300 -
Starting client version 10.0.9.21
FIPS mode disabled by default.
At 14:10:05 -0300 -
Cryptographic module initialized successfully.
Using crypto library libBEScrypto - OpenSSL 1.0.2zg 7 Feb 2023
Initializing Site: actionsite
Restricted mode
Initializing Site: BES Support
Initializing Site: BigFix Labs
Initializing Site: CustomSite_Windows
Initializing Site: Enterprise Security
Initializing Site: IBM License Reporting
Initializing Site: Patching Support
Initializing Site: Updates for Windows Applications
Initializing Site: Virtual Endpoint Manager
At 14:10:06 -0300 -
Initializing Site: mailboxsite
Initializing Site: opsite10
Processing Download plugins
Setting _BESClient_Download_FastHashVerify enabled: Off
Beginning Relay Select
GetRelayInfo: checking 'http://127.0.0.1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=Version'
GetRelayInfo: Valid Relay
At 14:10:07 -0300 -
RegisterOnce: Attempting secure registration with 'https://127.0.0.1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=10.0.9.21&Body=1083399088&SequenceNumber=7489&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&Root=http://rootserver.domain.com%3a52311&AdapterInfo=00-50-56-a0-3b-63_IPADDRESSHERE_0'
Unrestricted mode
Configuring listener without wake-on-lan
Registered with url 'https://127.0.0.1:52311/cgi-bin/bfenterprise/clientregister.exe?RequestType=RegisterMe60&ClientVersion=10.0.9.21&Body=1083399088&SequenceNumber=7489&MinRelayVersion=7.1.1.0&CanHandleMVPings=1&Root=http://rootserver.domain.com%3a52311&AdapterInfo=00-50-56-a0-3b-63_
Registration Server version 10.0.9.21 , Relay version 10.0.9.21
Relay does not require authentication.
Client has an AuthenticationCertificate
Using localhost. Parent Relay selected: rootserver.domain.com:52311. at: 999.100.1.999:52311 on: IPV4 (Using setting IPV4ThenIPV6)
At 14:10:11 -0300 -
Entering Service Loop.
Starting Service Loop.
A2AServer::Start().
FAILED to Synchronize - General transport failure. - 'http://127.0.0.1:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/actionsite&Time=15Jan14:10:11&rand=feaef22a&ManyVersionSha1=84cbb8cb http failure code 503 - gather url - http://127.0.0.1:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/actionsite&Tim=5a1:01&adfae2&MyVerionSha1=84cbb8cb3aaa7a
Successful Synchronization with site 'mailboxsite' (version 9) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/mailboxsite1083398'
Successful Synchronization with site 'BES Support' (version 1486) - 'http://sync.bigfix.com/cgi-bin/bfgather/bessupport'
Successful Synchronization with site 'BigFix Labs' (version 55) - 'http://sync.bigfix.com/cgi-bin/bfgather/bigfixlabs'
At 14:10:12 -0300 -
Successful Synchronization with site 'CustomSite_Windows' (version 117) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/CustomSite_Windows'
Successful Synchronization with site 'Enterprise Security' (version 4316) - 'http://sync.bigfix.com/cgi-bin/bfgather/bessecurity'
Successful Synchronization with site 'IBM License Reporting' (version 155) - 'http://sync.bigfix.com/cgi-bin/bfgather/ibmlicensereporting'
Successful Synchronization with site 'Patching Support' (version 1083) - 'http://sync.bigfix.com/cgi-bin/bfgather/patchingsupport'
At 14:10:13 -0300 -
Successful Synchronization with site 'Updates for Windows Applications' (version 2074) - 'http://sync.bigfix.com/cgi-bin/bfgather/updateswindowsapps'
Successful Synchronization with site 'Virtual Endpoint Manager' (version 70) - 'http://sync.bigfix.com/cgi-bin/bfgather/virtualendpointmanager'
Successful Synchronization with site 'opsite10' (version 352) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/opsite10'
ActiveDirectory: Refreshed Computer Information - Domain: Domain
At 14:10:14 -0300 -
[ThreadTime:14:10:11] SetupListener success: IPV4/6
Encryption: optional encryption with no certificate; reports in cleartext
At 14:10:15 -0300 -
Report posted successfully
At 14:11:16 -0300 -
Successful Synchronization with site 'actionsite' (version 1403) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/actionsite'
At 14:23:18 -0300 -
Report posted successfully
At 14:31:21 -0300 -
Report posted successfully
At 14:39:23 -0300 -
Report posted successfully
At 14:47:21 -0300 -
Report posted successfully
At 14:55:32 -0300 -
Report posted successfully
At 15:05:54 -0300 -
ForceRefresh command received. Version difference, gathering action site.
At 15:05:56 -0300 -
Successful Synchronization with site 'actionsite' (version 1403) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/actionsite'
Gathering all operator/mailbox sites.
At 15:05:57 -0300 -
Successful Synchronization with site 'mailboxsite' (version 9) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/mailboxsite1083399088'
At 15:05:58 -0300 -
Successful Synchronization with site 'opsite10' (version 352) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/opsite10'
At 15:05:59 -0300 -
Error posting report to: 'http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe' (General transport failure.
'http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe' http failure code 503)
At 15:07:06 -0300 -
ForceRefresh command received. Version difference, gathering action site.
At 15:07:08 -0300 -
Successful Synchronization with site 'actionsite' (version 1403) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/actionsite'
Gathering all operator/mailbox sites.
At 15:07:10 -0300 -
Successful Synchronization with site 'mailboxsite' (version 9) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/mailboxsite1083399088'
At 15:07:12 -0300 -
Successful Synchronization with site 'opsite10' (version 352) - 'http://rootserver.domain.com:52311/cgi-bin/bfgather.exe/opsite10'
At 15:07:25 -0300 -
Error posting report to: 'http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe' (General transport failure.
'http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe' http failure code 503)
At 15:09:48 -0300 -
Full Report posted successfully
At 15:15:37 -0300 -
Report posted successfully
At 15:23:28 -0300 -
Error posting report to: 'http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe' (General transport failure.
'http://127.0.0.1:52311/cgi-bin/bfenterprise/PostResults.exe' http failure code 503)
At 15:30:54 -0300 -
Report posted successfully
At 15:36:37 -0300 -
ActiveDirectory: User logged in - Domain: Domain User: 9999999
ActiveDirectory: Refreshed User Information - Domain: Domain User: 09999999
At 15:36:41 -0300 -
User interface process started for user '9999999'
At 15:38:40 -0300 - Patching Support (http://sync.bigfix.com/cgi-bin/bfgather/patchingsupport)
Fixed - Task: Windows Update Service - Start the service (fixlet:12003)
Relevant - Task: Windows Update Service - Stop the service (fixlet:12004)
At 15:38:46 -0300 -
Report posted successfully
At 15:40:09 -0300 -
Client shutdown (Service manager stop request)
Last part happens all the time “Client shutdown (Service manager stop request)”, but when I check the server the service is running.
Any idea on where I should start investigating? It’s worth to note that after that the clients on this relay will choose their failover relay or the root server.
The event viewer does not display any BESRelay stoppage.
This thing has been going on for so many months that I started losing my hair.
Anyway, I hope to find a solution on this.