Migrating bfxmaster... Advice from the Pros?

jgstew · December 2, 2016, 5:39pm

This should only happen if you took action on the old root server setup after taking the DB snapshot and restoring it to the new server.

Ideally you would turn off all of the bigfix processes on the old root server, then backup and copy everything over to the new one, start it up, and make the switch.

It does make sense to change the heartbeat and minimum report interval and minimum analysis time before doing the switch over to be less aggressive, and then ramp that back up once everything seems to be working well on the other side, too late now though.

What is the storage speed on the new server where FillDB is, as well as the MSSQL/DB2 instance? It isn’t a bad idea to put that on SSD storage because it is sensitive to IOPS more so than the rest of the root server stuff. If it doesn’t catch up then you’ll probably need to consider faster storage.

How many endpoints? How many operators?

Yep, that is fine. The only thing that could get confused is the bigfix client on the root server itself, but it should connect to localhost and not to an actual relay FQDN so that should not be a problem, and even if it was, it should still work correctly, just would be in an odd case of reporting to the fake root first, then things getting passed along to the real root.

They may have to run relay autoselection again or something. I’ve never actually done this before, but the clients will think they are still talking to the same relay, when in fact they are not. I think the relay should make them reregister, or perhaps just accept their connections silently as if nothing changed.

You could set up a test client and relays to play around with this scenario. Set the client for manual relay selection to talk to a specific relay, then swap out which host that FQDN resolves to a different relay and see what happens.

Probably not a terrible idea, but probably not required either. It is usually the root that is more likely to be affected by things like this, not usually the relays, but maybe. How many clients are talking to the root?

You could do that, but if the real master server has a FQDN that doesn’t match what is in the masthead, you can just tell the fake root that it’s parent relay is the real master server’s FQDN (BFX04)… the fake root doesn’t need to know that it is talking directly to the real master/root server. It only cares that it tries to talk to a relay and succeeds. (the root is also a relay)

There is another thing that I would also recommend trying, which you should probably do first. If you are using Relay autoselection, then you should set the following on the real root server: _BESRelay_Selection_AutoSelectableRelay to 0

Read more here: Legacy Communities - IBM TechXchange Community

The real root server and the fake root should both not be set to be autoselectable to encourage clients to talk to other relays instead. Clients will always failover to the root server (FQDN in the masthead) if they fail to connect any other way, but you only want them to try this if they have no better option.