Clients not connecting to relay during relay selection

Investigating the last few systems that don’t want to cooperate with relay selection is never easy. Your assessment is most likely correct, but here’s the best way to try and pin it down, most of which it sounds like you’ve done:

  • Enable debug logging to 10000
  • Deploy a Force Automatic Relay Selection task
  • Gather Client Diagnostics with the Relay Selector test (if the number of relays is 250 or less)

The debug log will show you how relay selection appears to the client (who is it sending pings to, is it getting response, etc). The diagnostics will show you whether relays can be reached by ICMP and/or TCP from the Windows perspective. These should align, but if not, it would suggest a bug in the agent code.

Assuming they do align and the pings are not getting a response, the next step would be to look at the network layer which would usually require a wireshark capture on the client and the relay to confirm whether the network requests are actually making it out to the network and being received by the relay system. I would expect you will find the packets are being lost here (e.g. you can see them leaving the client, and maybe on some intermediate routers, but not getting to the relay), which means there is some issue within the network infrastructure between the two systems or in the NIC/network driver layer of the relay system.