BigFix WebUI Migration - DR Scenario

FatScottishGuy · October 6, 2022, 9:50am

Does anyone have any advice or comments on how we should migrate our WebUI setup in cases of invoking DR?

Our current DR setup is as follows:

Prod Root Server in Location 1 with essential RoboCopy file transfers daily to DR Root Server in Location 2

Prod SQL Server in Location 1 log shipping to DR SQL Server in Location 2

When we invoke DR we install BigFix from scratch in our DR location after setting location 2’s SQL server to primary and pointing the new installation to the new SQL server.

Essentially this means everything is now live in location 2 and location 1 is still live but with services stopped (replicating a real DR scenario).

With our design, we would simply stand up a new install of WebUI in location 2 and point it to the SQL server in the same location however the problem we have is that WebUI doesn’t follow the same rules as the root server or any other part of BigFix as it goes:

Important Note: Only install the WebUI on one Endpoint per deployment. Make sure that no WebUI installation is already present in your deployment.

If I stop the WebUI service in Location one, can I stand up a new WebUI installation in Location 2? Will the DB tolerate this?

As far as I can tell, standing up a new WebUI server isn’t an option so wonder if anyone has any advice on this?

jbruns2017 · October 6, 2022, 2:29pm

Our setup includes (DSA is no longer strategic) MS failover clustering (MSFC). This was an approved MS solution by HCL architecture. HCL architecture has stated this is their strategic plan and eventually you will see MSFC being recommended in their planning guides for HA.

Today, some parts like WebUI, do not play well with cluster shared disk (where we first tried to put our \program files\) so we ended up installing \program files on local (NVM/e) disk. When we tried cluster shared disk for \program files\, WebUI fell down and corrupted the DB so we went with local NVM/e disk. I cannot stress enough how much performance improvement we got by making our d:\program files an NVM/e disk.

The very important part to note is from your primary server, you will need to run the ServerKeyTool.exe to decrypt your keys from your primary node. Then you take the files it creates, place them in a temp folder on your target server, run the ServerKeyTool.exe on your target server to encrypt the files and then you copy them to an empty “?:\Program Files (x86)\BigFix Enterprise\BES Server\” that you create ahead of time. Then you proceed to do the msi install.

The files we are referring to are in the \BES Server\ folder and are called:
EncryptedAPIServerKey
EncryptedClientCAKey
EncryptedPlatKey
EncryptedServerSigningKey
EncryptedWebUICAKey

Make sure you take a full DB backup before doing any work on your secondary node. You may also want to consult with HCL to confirm your steps.

We use MSFC to control the BES windows services when we move from 1 node to the other. Our DR is a stand-alone member server. We run a 7zip process that compacts and sends the \BES Server\UploadManagerData\BufferDir\sha1 every hour to the DR server. Then we unzip that 7zip file to do DR testing. We create 100 7zip files to represent \BES Server\UploadManagerData\BufferDir\sha1\nn\ where nn is the 00-99 parent folder names. We tried 1 large 7zip and that did not work so well, we we make a 7zip for each parent folder (100 of em).

We use robocopy to sync the \BES Server\UploadManagerData\BufferDir\sha1 and \BES Server\wwwrootbes\Uploads between active and passive nodes every hour. This is how we make BF cluster aware when it really isn’t. One day hopefully we will be able to use cluster shared disk for \program files\ data and not have to worry about synching.

We also use robocopy to sync the \BES Server\wwwrootbes\Uploads\ folder from the active node to the passive node AND to the DR node. Since \BES Server\wwwrootbes\Uploads\ should not see a lot of changes, we decide to use robocopy for synching active/passive and to the DR machine.

We do other things to make BF cluster aware but won’t get into those details here.

JasonWalker · October 6, 2022, 3:02pm

I believe we’re still putting the finishing touches on guidance to use OS clustering for root server disaster recovery, but WebUI brings a different set of challenges.

I believe our current guidance is still to build a new WebUI server as part of the recovery, rather than standing one up ahead of time. A WebUI installation and gather should not take long, and the necessary data is stored in the BFEnterprise database so there should not be a need to preserve data from the original WebUI instance.

There are a couple of issues trying to run redundant WebUI instances. They would each try to use the same database, very likely corrupting each other, and also they use a certificate for authenticating to the root and I believe only one certificate can be valid at a time, so installing a second WebUI should invalidate the first one’s certificate and make it stop connecting.

FatScottishGuy · October 6, 2022, 3:26pm

This is the part I think will break us

I’ve ran the DB change fixlet for WebUI just now but I can’t see any way to get the application server itself running as it will conflict like you said, even if I kill location 1 for the time being. When location 1 comes back online it will clash with location 2 and in a DR scenario, you can’t be certain when that location 1 server will return to active mode.

@jbruns2017 I remember talking to @cstoneba about that method way back when I started at Ref as I wasn’t sure the DR solution we had would work too well. Your method is waaaay better but the biggest problem is the need for the always on SQL server which I believe has quite a cost implication over how it is just now.

jbruns2017 · October 6, 2022, 3:37pm

You can have 2 WebUI installations that live in harmony. You have to follow the keys process I detailed above to make it work.

Someday, WebUI will be MSFC aware and all of this key business goes away.

FatScottishGuy · October 6, 2022, 3:48pm

Isn’t it the same process to migrate the route server to a new sever too?

If so I’ve done this as a default part of the DR solution we have so I can test that WebUI scenario

jbruns2017 · October 6, 2022, 3:53pm

You just cannot have 2 WebUI services running at same time. Hence MSFC to manage it all.

jbruns2017 · October 6, 2022, 4:26pm

To migrate to anything other than your primary, I believe what I detailed will work. Confirm with HCL support.

FatScottishGuy · October 6, 2022, 6:41pm

Cheers Joe! Much appreciated!

DanieleColi · October 7, 2022, 10:03am

Don’t know if you already looked at the WebUI documentation: https://help.hcltechsw.com/bigfix/10.0/webui/WebUI/Admin_Guide/c_dsa.html

Hope this help.

FatScottishGuy · October 7, 2022, 12:28pm

Yeah, unfortunately we don’t use DSA

DanieleColi · October 7, 2022, 12:58pm

Well, the logic behind it’s much pretty the same…
If you want to use the same WebUI installation, you have to follow steps from 3. to the end; if you want to install a DR WebUI, just stop the primary and install the new one (you can deploy it using the fixlet from the DR Console).
Lastly, if you want a “warm” DR WebUI (that is a WebUI already installed, but with services down), you probably need to revokewebuicredentials for the primary hostname and create e new set of keys (createwebuicredentials) for the DR hostname: these are keys used internally by the WebUI to securely communicate with the Root Server and there can be only one set valid for each Root Server. Switching back from DR to primary requires to create a new set of keys (or uninstall and reinstall the WebUI).
There can be only one WebUI instance at a time running and accessing the DB.