Best option for DSA

gturne07 · October 2, 2018, 11:43am

We recently had a major Bigfix outage and upper management wants DSA configured in some capacity. I’d like to get a feel of what is being done and what my best approach should be. Our main BF server is physical hardware with a local database. In addition, we have 2 TLR, 4 DMZ relays, 85 regular relays and 50K+ endpoints. VM and remote database hasn’t been an options for us so, my assumption is that I would have to have another physical server with a local database. I’m reviewing the DSA documentation but just wanted to poll the community for advise and gotchas.

AlexaVonTess · October 2, 2018, 12:12pm

Out of curiosity, what was the cause/reason for the outage? Hardware?

Aram · October 2, 2018, 2:01pm

DSA is certainly an option, and as you highlight, the secondary server should have specifications very similar (if not the same) as the primary/master BigFix Server. Additionally, you’ll want high bandwidth and low latency between the BigFix Servers (ideally, they would be in the same datacenter). The main recommendation with DSA would be to properly scope and monitor the environment from a performance and capacity perspective as it does add some overhead (primarily storage and network).

That said, another option is to implement a “cold-standby” which would be able to take over in the event of a failure. This has the advantage that it adds minimal (if any) overhead to the BigFix Server, and does not necessarily have the same network requirements. The disadvantage is that it requires a manual failover process (or at least one that is initiated external to BigFix), and there is a greater chance for data loss associated with action history.

gturne07 · October 11, 2018, 2:00pm

Well…we kind of had a perfect storm situation compounded with years of DB maintenance neglect running on older hardware and software. We went into an environment refresh and never recovered. We would have these sporadic moments where endpoints would lose their minds, then we would have FillDB issues, and the main server is getting hammered causing significant slowness. In the past, we would come out of it between 4 and 6 hours. We didn’t recover this last time and was dead in the water for the better part of 2 weeks without a recovery strategy.

gturne07 · October 11, 2018, 3:03pm

Thanks for this information @Aram. Can you elaborate more on the “cold-standby”? It seems more in line of what we may be looking at doing. Right now our main server is physical with a local database. That being said, our DSA options will start around this premise. One concern that’s been mentioned is the cost of having to purchase a server and SQL licenses to sit versus just having a server ready to MIGRATE to. Make sense?

vnovik · October 12, 2018, 9:41am

Hello gturne07,

The general method is to do periodic backups (usually nightly) of the server and database files. In the event of a problem, the database and server files can be restored to the BigFix server computer (or another computer) and the system will be restored. This is sometimes called a “Cold Standby” method of disaster recovery.

Pros:

Simple and easy and allows for multiple backups over time.
Does not require any additional hardware (hot or cold standby computer is optional).

Cons:

All information since the last backup will be lost in the event of a failure.
Might be significant downtime as the system is restored from the backup.

You may find more details here:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli+Endpoint+Manager/page/Disaster+Recovery+Overview