Relay Selection Over VPN with Split Tunneling - Cloud DMZ Relays

mbartosh · May 5, 2020, 2:29pm

My company wants to implement split tunneling in order to get Bigfix traffic off VPN. We have one DMZ relay that is internet facing. Currently, our VPN clients automatically select relays in the internal network.

I don’t know how relay selection is going to work with split tunneling. I saw another forum post that said that with split tunneling the relays automatically selected the DMZ relays. Is that the way it will works? If so, I will probably need to add a couple of relay servers to our DMZ since we have at least 3500 clients working from home using VPN.

We have command polling set to one hour. If all of our clients on VPN are using DMZ relays, I would like to set the command polling to 15 minutes.

JasonWalker · May 5, 2020, 2:40pm

As the clients are set to auto-select, by default they will most likely prefer the internal relays over the VPN, because the VPN tunnel will hide the network hop-count to the internal relays over VPN while the Internet relay will appear to be more network hops away over the clear internet connection.

You could apply Relay Affiliations SeekList on the VPN clients and Advertisement list on the DMZ relays (I can get more detailed here if interested), or block the ICMP Pings from the VPN address pool to the internal Relays to prevent them from selecting the internal relays.

mbartosh · May 5, 2020, 3:12pm

Thanks Jason! That is interesting. I wonder if a client sits on the internet for a while, say overnight, and then connects to VPN in the morning, I suppose it would stay connected to the internet relay. We have our failover set for 6 hours. It sounds like it is going to be hit or miss on whether the internet relay gets selected or the internal relay. Unless, we use affiliation on our DMZ relays.

We are currently using affiliations seeklists and advertisement lists throughout the organization. We have a very nice relay affiliation program which was written by John Talbert. It uses a subnet.txt file to determine the location of the client, and then affiliates based on location name.

I have the feeling we are going to have to go through the implementation and see how it works which isn’t really ideal.

JonL · May 5, 2020, 3:29pm

It sounds like you already have most of the ingredients, it’s just a matter of referencing them. We have a similar situation. We have primary and DR DMZ internet-facing relays as well as a VPN focused internal relay. Via an open policy action to the clients, we have logic, based on their subnets, to do relay affiliation. Integrated into that is if the client is on a VPN subnet, it will prefer the internal VPN relay. We intentionally set up a designated VPN client serving relay so we could throttle the output so as not to overwhelm the internet connection. The throttle value was agreed to with our network engineers.

For public IPs (ie NOT on the list of internal subnets that you already have compiled), relay affiliation is set to prefer our primary DMZ relay, then our secondary DMZ relay. The DMZ relays are set in the Failover list.

If the client is on a public address, then, via policy action, we also set the following:

setting "_BESClient_Comm_CommandPollIntervalSeconds"="1200" on "{now}" for client
setting "_BESClient_Comm_CommandPollEnable"="1" on "{now}" for client
setting "_BESClient_PersistentConnection_Enabled"="1" on "{now}" for client
setting "_BESClient_Download_Direct"="1" on "{now}" for client

When the client either gets on VPN or on an internal IP, the policy action will set command polling, persistence, and direct download to 0.

This process has worked well for us so far. We have throttles set on VPN and DMZ relays such that we have not overrun our internet bandwidth even with large numbers of remote users.

mbartosh · May 5, 2020, 4:21pm

We have Command polling enabled and set to 1 hour, but we have not had a need to change it when a client connects to VPN, but maybe it will be different with split tunneling turned on. I don’t know how it is going to behave.

I don’t understand what _BESClient_Download_Direct=1 does for you. The documentation on the setting is not clear to me.

JasonWalker · May 5, 2020, 6:57pm

_BESClient_Download_Direct=1 causes the client to run all downloads directly from their Internet sources - the client connects to microsoft.com to get the patches from them, instead of sending the download requests through the Relays.

That can drastically reduce your VPN / DMZ bandwidth, but you lose the ability to download things hosted directly on the BES Server, like Software Deployment packages (that reference the server name or http://127.0.0.1 in their download URLs)

jgstew · May 5, 2020, 9:48pm

In some ways, this is the best solution, because you want clients to generally prefer internal relays over DMZ relays, EXCEPT in this specific case.

The other option would be to block the VPN address pool from reaching the internal relays, but blocking only ICMP should have the same effect for AUTOMATIC relay selection while being less heavy handed.

We really need a “try direct first, then try relay” option for these cases.

trn · May 6, 2020, 1:17pm

Agreed. I have been looking at DMZ relays and there seems to be an option to favour the relay and failover to direct, but that is of no help and perpetual reconfiguration of clients that wander off-premises and back again is not practical.

jgstew · May 6, 2020, 2:43pm

You can actually do this with open actions. You could have 2 BigFix actions, one to set a client talking to a VPN or DMZ relay to download direct, and one talking to not either of those download from the relay. This would be fully automated as long as the relevance is written correctly, either based upon IP range or by selected relay. The relevance may need updated if things change in the future (IP range or new relay), but there is no reason this can’t be fully automated. The bigger issue is the lack of download from relay on failure setting.

trn · May 6, 2020, 4:26pm

Sorry James, but I’m not convinced of that.

We currently have relevance on our patching actions so that clients on VPN do not run large patches during the working day. The problem is that whilst the relevance evaluates correctly, a client that boots up off-premises but also off VPN, but already know of the action, will very often still run the action once the VPN comes up - it would seem that the relevance on the action doesn’t re-evaluate frequently enough. The clients will happily complete a baseline that include Windows 10 SSU, Security Patch and Office 365 whilst on the VPN which generates much muttering from the firewall team.

jgstew · May 6, 2020, 5:01pm

The overall baseline relevance should be evaluated again right before the action starts, but it is only effective at that moment, so things changing after that would not affect it.

if the baseline has relevance to not run when VPN connected, and the baseline starts before the VPN is connected, then the VPN is connected, then what you describe would be expected because the overall baseline relevance should only be evaluated before the baseline actions as a whole start.

A separate action outside of the baseline that set the Client setting to download direct like described above should be able to make this configuration even while the baseline is running in between components running, which should solve this:

trn · May 6, 2020, 5:42pm

Thanks James,

That fits with my impression of what was happening, once my relevance wasn’t having the desired effect.

Having only had small-ish numbers of clients on VPN before the days of Covid, we’ve not felt the need for DMZ relays or restrictions on VPN. It is likely that there will continue to be a sizable population off-premises for the foreseeable future, so we are going to implement a DMZ relay (something else to be pen-tested) and the option to try direct first and fail over to relay would (I think) fit our requirements well.

Knowing that our relevance to detect whether the client is on or off premises is working, I think I will be trying the pair of configuration fixlets with persistent actions.

jgstew · May 6, 2020, 6:06pm

Would strongly recommend a DMZ relay if you don’t have one. Let us know if you need any help with it.

in the ideal case, the only connection the DMZ relay has to the internal network is to it’s own parent relay over port 52311 TCP/UDP and perhaps ICMP.

The DMZ relay should only be exposed to the public internet over 52311 TCP and ICMP. You can also turn on Client Authentication, which means the only way to connect to the DMZ relay as a client over the public internet is to have an existing client authentication certificate from the root server from a previous connection to an unauthenticated relay. (either over VPN or internal network or create a password for new clients to first time auth to DMZ relays without certificate)

technically if a properly isolated DMZ relay were to be compromised, and you have client report encryption enabled, then the contents of client reports could not be read on the relay even if they are captured there, plus if actions are maliciously modified on the relay in an attempt to infect clients downstream, that will not work either due to signing and validation of the content.

dpowers1 · May 7, 2020, 3:04pm

All excellent points that should help you get things working as seamless as possible. We have been doing this clients quite a bit in the last few months as there number of “off-line” devices has increased dramatically.

Another thing to note, as i have had a few clients in this situation, was that even you get them using the DMZ relay, depending on how your network is designed, that TRAFFIC flow could still saturate your companies internet pipe (DMZ and VPN flow still travels over the same pipe). In this case we simply put a few “CLOUD” relays in there existing CLOUD infrastructure. Making sure to adhere to everything folks already said in this thread:

Traffic from CLOUD relay only talks 52311 back to the “real” DMZ relay
Authenticating
Traffic open in the cloud to endpoints is 5311 and ICMP
Manually set the CLOUD relay to the “DMZ” relays
etc.

Now when devices travel outside the network the BigFix agent will detect this and attempt a connection to the CLOUD relay. We did not have to change the BESClient_Download_Direct setting because they still wanted some self-service options to work that needed content from the relay infrastructure. But patches and “internet” downloads happened as normal but only had to be sent ONCE to the cloud relays, then downloaded to the 1,000s of client devices that are now work at home.

The added benefit (for one client) was that work at home force is spread across the country - placing the “CLOUD” relays in different cloud environments made network traffic even more efficient.

jgstew · May 8, 2020, 3:38pm

This is a very good point, and a very good use case. I think some issues with the VPN are that the devices handling the VPN traffic can’t handle the encryption / decryption speeds required to saturate the internet pipe, but if they were much faster, or using a DMZ relay, then it would be easier for the at home devices to saturate the pipe, which would be a good thing as far as getting more data out, but a bad thing as far as saturating the pipe completely, or nearly.

This is a really good use case for Cloud DMZ Relays. You are basically making use of the fact that the Cloud has nearly infinite internet bandwidth as compared to the average company. Plus you could make the Cloud Relay caches large enough to minimize the traffic from the on prem Relays to the Cloud Relays.

This is another point to make, just because the relay is in the cloud doesn’t mean that it is isolated from your on prem network or even your other cloud devices unless you make it that way. In general, Cloud environments tend to default to allowing very little traffic, but if you have security groups that open things up too much, then things are no longer isolated.

swiars · May 8, 2020, 3:54pm

My company blocked icmp pings and tcp/udp connection (for 52311 port)from the vpn to internal relays. Everything is working fine with command polling 30 minutes value. My laptops still have automatic relay selection.
Laptops can connect dmz relay in 5-10 minutes after besclient service start. Because of our version (9.5.10) we cant use persistent connection setting. And my company dont want to open microsoft.com urls. ( for download direct setting )

jgstew · May 8, 2020, 4:11pm

Are you having any issues? it sounds like things are working fairly well.

There is a way to trigger a heavier version of command polling with a self service offer that can be useful when doing active support calls with someone connected to a DMZ relay: bigfix-content/fixlet/Trigger Gathering - BES Client.bes at main · jgstew/bigfix-content · GitHub

swiars · May 8, 2020, 5:03pm

Still we don’t have any issue with this config. Clients responding their result with 0-30 minutes. By the way we limited our clients to 3Mbps registry value.

jgstew · May 13, 2020, 5:01pm

To add to this discussion even further, the ideal case is that the final failover relay configured for all clients is a fully qualified domain name (FQDN) for a DMZ relay. This is an ideal use case for a “Cloud DMZ Relay” as discussed above.

The reason for this is that if all other relays are unreachable for some reason, then you ideally want a relay of last resort. If you don’t have such a relay FQDN configured ahead of time, you can’t go back an easily add it to all clients if everything goes down. It must be done ahead of time.

Even better is if a Cloud DMZ Relay can have a static public IP configured, and have both that IP and a fully qualified domain name for a Cloud DMZ Relay configured in all clients. This is only useful in the case where DNS is failing for some reason, but it happens sometimes.

The reason for having a fully qualified domain name (FQDN) that you control point to a relay is that you can change the relay that it points to whenever you need to without reconfiguring your clients. So really I mean an FQDN you can easily change the IP or CNAME it points to if needed.