BigFix Enterprise Suite Disaster Recovery

Summary

This document will describe the disaster recovery plan for the BigFix Enterprise Suite (BES) including different strategies for handling disaster recovery and high availability requirements. There are two common strategies for dealing with failures of the BES Servers: High Availability using the Distributed Server Architecture and (DSA)Standard backup/restore .

Distributed Server Architecture (DSA)

BES has a very sophisticated built-in ability to install multiple BES Servers that will replicate information from each other. In the event of a failure of one BES Server, the other BES Servers will automatically takeover as fully functional BES Servers (will receive data from the BES Relays and BES Clients and accept BES Console connections). When the failed BES Server is restored, it will automatically receive updated information.

Pros/Cons

  • + Allows for high-availability of the BES system.
  • + Adds protection for hardware failures or network failures.
  • + Automatic failover when one BES Server fails.
  • + Automatic recovery when BES Server comes online again.
  • + Multiple BES Servers can operate "split" network environments independently and then will automatically reconcile information when network is restored.
  • - Requires additional hardware for each BES Server (all BES Servers should be similarly powered computers because they all will maintain all data from the system.)
  • - Requires some additional setup and maintenance work to ensure that replication is working properly.

DSA Procedure

Please DSA Setup Instructions for more information about setting up DSA.

Standard Backup / Restore

The standard backup / restore method is commonly used as a simple method of allowing for disaster recovery in BES. The general method is to do periodic backups (usually nightly) of the BES Server and database files. In the event of a problem, the database and BES Server files can be restored to the BES Server computer (or another computer) and the system will be restored. This is sometimes called a "Cold Standby" method of disaster recovery.

Pros/Cons

  • + Simple and easy and allows for multiple backups over time.
  • + Does not require any additional hardware (hot or cold standby computer is optional).
  • - All information since the last backup will be lost in the event of a failure.
  • - Might be significant downtime as the system is restored from the backup.

Backup Procedure

  1. Using SQL Server Enteprrise Manager, establish a maintenance plan for nightly backups for the BFEnterprise and BESReporting databases -- Multiple backup copies allow for greater recovery flexibility. Consider backing up to a remote system will allow for higher fault tolerance.
  2. The following files/folders are used by the BES Server. In the event of a failure, these files can be rebuilt automatically by the server, but backing them up will allow for faster recovery.
    • [BES Server folder]\ClientRegisterData\registrationlist.txt -- Information about last known IP address of computers.
    • [BES Server folder]\Mirror Server\Inbox\bfemapfile.xml -- Information necessary for BES Agents to get actions/Fixlets.
    • [BES Server folder]\sitearchive -- Information necessary for BES Agents to get actions/Fixlets.
    • [BES Server folder]\wwwrootbes\bfsites -- Information necessary for BES Agents to get actions/Fixlets.
    • [BES Server folder]\wwwrootbes\bfmirror\bfsites -- Information necessary for BES Agents to get actions/Fixlets.
    • [BES Server folder]\wwwrootbes\bfmirror\downloads -- Contains the download cache.
    • [BES Server folder]\wwwrootbes\Uploads -- Contains files uploaded to the system (Warning! The server will not be able to recover these files automatically unless you have these files backed up somewhere).
  3. Securely backup site credentials, license certificates, and publisher credentials -- The license.pvk, license.crt, and publisher.pvk files are critical to the security and operation of BES. If the private key (pvk) files are lost, they cannot be recovered. These files must be securely backed up.
  4. Backup the user account information in SQL Server -- The database usernames and privileges are stored in the master database on SQL Server and will need to be restored in the event of a failur (otherwise all logins would need to be recreated). Information on how to backup SQL Server login information is available at: http://support.microsoft.com/kb/246133/.

Recovery Procedure

  1. Using either the previous BES Server computer or new computer, install SQL Server (use the same version of SQL Server as was previously used). Make sure you are using "Mixed Mode" authentication.
  2. Make sure the new BES Server computer can be reached on the network using the same url that is in the masthead file. (for instance: http://192.168.10.32:52311/cgi-bin/bfgather.exe/actionsite OR http://bigfixserver.company.com:52311/cgi-bin/bfgather.exe/actionsite). Important Note: To avoid issues where the BES Clients connect to the BES Server before it is fully restored, it is best to make sure the BES Server is not available on the network until the migration is complete (e.g., don't push the DNS update until the migration is complete).
  3. Install the BES Server component using the masthead file.
  4. Stop all the BES Server services (BES FillDB, BES GatherDB, BES Gather, BES RootServer, BES Web Reports Server).
  5. The BES Server install will create new BFEnterprise and BESReporting databases. Delete these databases using SQL Server Enterprise Manager and then restore the databases from the backups.
  6. Restore the SQL Server login information: http://support.microsoft.com/kb/246133/.
  7. Restore the backed up files/folders (overwriting any existing files).
  8. Start all the BES Server services.

Verification of Restore

To make sure that the BES Server has been successfully restored, perform the following steps:
  1. Check the BES Diagnostics to make sure all services are properly started.
  2. Login with the BES Console and verify that the logins work properly and the database information was properly restored.
  3. BES Clients and BES Relays should soon notice that the server is available and will be reporting data to the server. Full recovery with all agents reporting will usually take anywhere from a few minutes to many hours (depending on the size of the deployment and how long the server was unavailable). In any circumstance, at least some agents should be reporting updated information within an hour or so).
  4. After verifying some agents are reporting properly, send a "blank action" (Tools > Take Custom Action, target "All Computers", click OK) to all computers. The blank action will not make any changes to the agent computers, but the agents will report that they received the blank action. If the most agents respond to a blank action, it is a very strong indicator that everything is working well because sending an action tests many core components and communication paths of BES.
  5. Login to the web reports and ensure the data was restored properly.
  6. Contact BigFix Technical Support with any issues or questions.