BigFix Enterprise Suite Disaster Recovery
Summary
This document describes the disaster recovery plan for BigFix Enterprise Suite, including different strategies for handling disaster recovery and high availability requirements. There are two common strategies for dealing with failures of the BigFix Servers: High Availability using the Distributed Server Architecture and (DSA) Standard backup/restore.
Distributed Server Architecture (DSA)
BigFix has a very sophisticated built-in ability to install multiple BigFix Servers that will replicate information from each other. In the event of a failure of one BigFix Server, the other BigFix Servers will automatically takeover as fully functional BigFix Servers (will receive data from the BigFix Relays and BigFix Clients and accept BigFix Console connections). When the failed BigFix Server is restored, it will automatically receive updated information.
Pros/Cons
- + Allows for high-availability of the BigFix system.
- + Adds protection for hardware failures or network failures.
- + Automatic failover when one BigFix Server fails.
- + Automatic recovery when BigFix Server comes online again.
- + Multiple BigFix Servers can operate "split" network environments independently and then will automatically reconcile information when network is restored.
- - Requires additional hardware for each BigFix Server (all BigFix Servers should be similarly powered computers because they all will maintain all data from the system.)
- - Requires some additional setup and maintenance work to ensure that replication is working properly.
DSA Procedure
See DSA Setup Instructions for more information about setting up DSA.Standard Backup/Restore
The standard backup/restore method is commonly used as a simple method of allowing for disaster recovery in BigFix. The general method is to do periodic backups (usually nightly) of the BigFix Server and database files. In the event of a problem, the database and BigFix Server files can be restored to the BigFix Server computer (or another computer) and the system will be restored. This is sometimes called a "Cold Standby" method of disaster recovery.
Pros/Cons
- + Simple and easy and allows for multiple backups over time.
- + Does not require any additional hardware (hot or cold standby computer is optional).
- - All information since the last backup will be lost in the event of a failure.
- - Might be significant downtime as the system is restored from the backup.
Backup Procedure
- Using SQL Server Enteprrise Manager, establish a maintenance plan for nightly backups for the BFEnterprise and BESReporting databases -- Multiple backup copies allow for greater recovery flexibility. Consider backing up to a remote system will allow for higher fault tolerance.
- The following files/folders are used by the BigFix Server. In the event of a failure, these files can be rebuilt automatically by the server, but backing them up will allow for faster recovery.
- [BigFix Server folder]\ClientRegisterData\registrationlist.txt -- Information about last known IP address of computers.
- [BigFix Server folder]\Mirror Server\Inbox\bfemapfile.xml -- Information necessary for BigFix Agents to get actions/Fixlets.
- [BigFix Server folder]\sitearchive -- Information necessary for BigFix Agents to get actions/Fixlets.
- [BigFix Server folder]\wwwrootbes\bfsites -- Information necessary for BigFix Agents to get actions/Fixlets.
- [BigFix Server folder]\wwwrootbes\bfmirror\bfsites -- Information necessary for BigFix Agents to get actions/Fixlets.
- [BigFix Server folder]\wwwrootbes\bfmirror\downloads -- Contains the download cache.
- [BigFix Server folder]\wwwrootbes\Uploads -- Contains files uploaded to the system (Warning! The server will not be able to recover these files automatically unless you have these files backed up somewhere).
- Securely backup site credentials, license certificates, and publisher credentials -- The license.pvk, license.crt, and publisher.pvk files are critical to the security and operation of BigFix. If the private key (pvk) files are lost, they cannot be recovered. These files must be securely backed up.
- Backup the user account information in SQL Server -- The database usernames and privileges are stored in the master database on SQL Server and will need to be restored in the event of a failur (otherwise all logins would need to be recreated). Information on how to backup SQL Server login information is available at: http://support.microsoft.com/kb/246133/.
Recovery Procedure
- Using either the previous BigFix Server computer or new computer, install SQL Server (use the same version of SQL Server as was previously used). Make sure you are using "Mixed Mode" authentication.
- Ensure that the new BigFix Server computer can be reached on the network using the same URL that is in the masthead file. (For example: http://192.168.10.32:52311/cgi-bin/bfgather.exe/actionsite OR http://bigfixserver.company.com:52311/cgi-bin/bfgather.exe/actionsite).
Important: To avoid issues where the BigFix Clients connect to the BigFix Server before it is fully restored, it is best to make sure the BigFix Server is not available on the network until the migration is complete (i.e., don't push the DNS update until the migration is complete). - Install the BigFix Server component using the masthead file.
- Stop all BigFix Server services (BigFix FillDB, BigFix GatherDB, BigFix Gather, BigFix RootServer, BigFix Web Reports Server).
- The BigFix Server installation will create new BFEnterprise and BESReporting databases. Delete these databases using SQL Server Enterprise Manager and then restore the databases from the backups.
- Restore the SQL Server login information: http://support.microsoft.com/kb/246133/.
- Restore the backed up files/folders (overwriting any existing files).
- Start all the BigFix Server services.
Verification of Restoration
To make sure that your BigFix Server has been successfully restored, perform the following steps:- Check the BigFix Diagnostics to make sure all services are properly started.
- Login with the BigFix Console and verify that the logins work properly and the database information was properly restored.
- BigFix Clients and BigFix Relays should soon notice that the server is available and will be reporting data to the server. Full recovery with all agents reporting will usually take anywhere from a few minutes to many hours (depending on the size of the deployment and how long the server was unavailable). In any circumstance, at least some agents should be reporting updated information within an hour or so).
- After verifying some agents are reporting properly, send a "blank action" (Tools > Take Custom Action, target "All Computers", click OK) to all computers. The blank action will not make any changes to the agent computers, but the agents will report that they received the blank action. If the most agents respond to a blank action, it is a very strong indicator that everything is working well because sending an action tests many core components and communication paths of BigFix.
- Login to the web reports and ensure the data was restored properly.
- Contact BigFix Technical Support with any issues or questions.
