Bigfix ROOT API webui-sites no longer working

Hi,
The WebUI connects to the Root Server (on port 52315) using HTTPs and authenticate using a client certificate.
From your investigation, we can read

schannel: disabled automatic use of client certificate

that suggests to be the root cause. I suppose the changes are coming from a Windows Policy. Some google search might help.

Hello the schannel message is generic, due to windows being the test machine I am testing for the proof of failure.

Normally, regardless of if you provide the certificate or not, windows OR linux the webui-sites section will respond immediately OR reject (since in my tests I am not providing the certificate), in this case it just “hangs” until the default timeout occurs.

This is my issue, why is the specific subsite for webui hanging? How can I reset it’s status?
Any help would be great.

googling schannel: disabled automatic use of client certificate will provide no insight, this is specific to WEBUI subsite running on the bigfix root server.

In which log are you seeing those messages?

Hi Jason,

in terms of logs we only have the service-wrapper log that tries to connect endlessly without success, it took awhile to figure out what was actually happening as we originally did telnet tests to the specific port from the webui host with success:

Tue, 01 Aug 2023 08:28:15 -0400 – [WebUI] Failed to update service application: HTTP Error 28: Timeout was reached: SSL connection timeout
Tue, 01 Aug 2023 08:29:15 -0400 – [WebUI] Stopping WebUI service app
Tue, 01 Aug 2023 08:30:11 -0400 – [WebUI] Failed to update service application: HTTP Error 28: Timeout was reached: SSL connection timeout
Tue, 01 Aug 2023 08:31:12 -0400 – [WebUI] Stopping WebUI service app
Tue, 01 Aug 2023 08:32:08 -0400 – [WebUI] Failed to update service application: HTTP Error 28: Timeout was reached: SSL connection timeout
Tue, 01 Aug 2023 08:33:09 -0400 – [WebUI] Stopping WebUI service app
Tue, 01 Aug 2023 08:34:09 -0400 – [WebUI] Failed to update service application: HTTP Error 28: Timeout was reached: SSL connection timeout
Tue, 01 Aug 2023 08:35:10 -0400 – [WebUI] Stopping WebUI service app
Tue, 01 Aug 2023 08:36:06 -0400 – [WebUI] Failed to update service application: HTTP Error 28: Timeout was reached: SSL connection timeout
Tue, 01 Aug 2023 08:37:06 -0400 – [WebUI] Stopping WebUI service app
Tue, 01 Aug 2023 08:38:00 -0400 – [WebUI] Failed to update service application: HTTP Error 28: Timeout was reached: SSL connection timeout
Tue, 01 Aug 2023 08:39:00 -0400 – [WebUI] Stopping WebUI service app
Tue, 01 Aug 2023 08:40:03 -0400 – [WebUI] Failed to update service application: HTTP Error 28: Timeout was reached: SSL connection timeout
Tue, 01 Aug 2023 08:41:04 -0400 – [WebUI] Stopping WebUI service app

This was triggered by a reboot of the root server.
While this issue persists, accessing 52311/api is fine and webreports is also functioning correctly.

image

From testing on a working environment this should not roll until triggering a timeout but should simply reject the connection, so it seems specifcally /webui-sites is currently hanging

Ah, ok,I suspected that was just curl output but didn’t want to write much more until I was sure.

Curl is not going to be a good test case I’m afraid. The connection to :52315 is intended for use by WebUI only. It only accepts Certificate authentication, and the only certificate it accepts is the one generated during the WebUI install. You won’t be able to send WebUI’s client cert with curl, so that won’t be very helpful aside from checking that the 52315 port is reachable.

I believe the messages from curl about channel are also unrelated; as far as I recall, WebUI does not use the microsoft-specific channel library at all - we bring our own OpenSSL-based library instead.

If you don’t have a support case open already, please open one so the team can look at your system with you. From those entries in the service-wrapper.log, I’m not sure whether the timeout is with connecting to the root server on :52315, or if it’s the Node.js loopback connections trying to contact the WebUI apps.

In your configuration, are BESRoot and WebUI on the same host, or on different hosts?

I’d also check for WebUI port conflicts, especially the dynamic port range. If another service is already using one or more of the ports we can get initialization failures like this… in particular I’ve seen that the Nutanix Guest Additions conflicts with our default port range, if this happens to be a Nutanix VM that may be at least one of the issues.

Longer, very similar, discussion at WebUI install not working, no logs

Hi Jason,

I’m not specifically looking at CURL to report a specific error, my mention is purely how it’s reacting, also I can perform a full connection exiting in OK - 200 by supplying the cert + key + ca cert so “You won’t be able to send WebUI’s client cert with curl” is incorrect if you supply all the info.

Here is an example:

What I am trying to show is, it should not timeout and nothing more (which should not happen)
I will deep dive into the dynamic ports, ports usage.

WEBUI and root are on different servers, I am getting the same error when trying from the root server directly, thanks for the continued support, much appreciated.

Hello,

Well this is fun, from what I’m seeing services.exe bound itself to 52313, default config is:

Protocol tcp Dynamic Port Range

Start Port : 49152
Number of Ports : 16384

Which means it had a 0.00006% to bind to the needed port… ?

Services.exe bound to 52313 or 52315 ? I’m not certain whether we use 52313 but it does seem vaguely familiar, will check my notes.

It’s also possible to exclude individual port numbers from being dynamically allocated by RPC, I’ll retrieve my notes on that as well. Should prevent these accidental duplications by automatic assignment in the dynamic range (but not in the cases like Nutanix Guest that specifically requests port 5000 )

Usually on Server the dynamic ports are allocated, in order, starting from 49152 and incrementing from there. We usually don’t have a problem because by the time the server reaches the 52311 and higher numbers, our services have already started and consumed those ports so the endpoint mapper doesn’t allocate them. Usually the exception is if the same server hosts DNS, because DNS pre-allocates thousands of ports and might grab the 523xx ports ahead of us.

So, the odds of your conflict are even smaller - not only did the port get allocated to begin with, but it was also allocated during the small window of time that our service wasn’t already using it.

…ok, I don’t see that we are using 52313.

If you’re getting a conflict on 52315, you can use this command to exclude 52315 from the range dynamically allocated by Windows. BESRootServer service has to be stopped, along with whatever other application might conflict with 52315 - the netsh command will fail if the 52315 port is in use

netsh int ipv4 add excludedportrange tcp 52315 1

The ‘add excludedportrange’ affects both ipv4 and ipv6. If your server is ipv6 only, then exclude it on ipv6 instead

netsh int ipv6 add excludedportrange tcp 52315 1

If your server runs both ipv4 and ipv6 (the default), then you only need one command or the other; trying to run both would give a failure message on the second attempt because the exclusion is already in place.

( all of this assuming your actual conflict with services.exe was on 52315 and that your 52313 message earlier was an oversight? In any case I’m planning to publish some content on bigfix.me to check for / prevent a conflict of this type )

Hi Jason, yes apologies, 52315 is used and not 52313, I was just looking for a way to exclude the specific port.
The question here then becomes, shouldn’t the bigfix server claim the port first and services.exe ignore it?

Seeing as the range also included 52311, in theory services.exe could also bind itself on that port?
:sleeping:

Also, that screenshot where you used the WebUI client cert to connect to 52315 with curl…that shows a successful connect, with what looks like a probably correct response. Is that from the working server or the failing server?

A working server that i’ve used to compare against the broken system, this lead me to see how things “should look”

So have you established that it actually is BESRootServer listening on 52315 on the broken server? netstat -anob or the Resource Monitor -> Network tab should show this

Hi Jason,

Yes I was able to see this via the service-app logs (from when it was working fine):

And here is the full tcpview capture:

The tcpview capture is from now, while it is not working?

Hi Jason, yes exactly, whenever I see the IP of the webui pop-in, it’s hitting the services.exe instance and not BESRootServer

Ok, what I’d suggest is rebooting the server (I don’t know whether the services.exe process itself can be restarted safely), then check that BESRootServer is the only process listening on 52315, and whether that allows WebUI to start up correctly.

If that resolves it, I can post a fixlet to create a permanent port exclusion to prevent 52315 from being automatically allocated by Windows.

Hi Jason,

The plan is to reboot today end of day and see what port services.exe self allocates, will report back.

1 Like