TEM Distaster Server Architecture (DSA) has a highly sophisticated built-in ability to install multiple TEM Servers that will replicate information from each other for the purpose of disaster recovery. In the event of a failure of one TEM Server, other TEM Servers will automatically takeover as fully-functional TEM Servers (will receive data from the TEM Relays and TEM Clients and accept TEM Console connections). When the failed TEM Server is restored, it will automatically receive updated information.
The DSA architecture health is dependent upon the health and efficiency of the database replication process facilitated by the FillDB service. If actions are successfully propagated (in the Console) and the database has successfully replicated (see the Replication tab in the TEM Admin tool) actions will run appropriately on all child endpoints.
However if the primary DSA server is disabled after action propagation but before successful database replication then the secondary will NOT have the newly propagated actions. In this case children of the Secondary will not be provided the version of the actionsite that contains changes prior to replication and as such the Secoondary will not receive the new actions or associated downloads for those actions. In this case the desired actions would need to be taken again from the Secondary.
In all cases clients will continue to be provided an actionsite (containing open actions) for gathering and you can continue to manage the deployment and take new actions from the Secondary server.
The term 'high availability' refers to disaster or event recovery that is immediate and indeed in any case a secondary DSA server is immediately available for deployment management. However, the actionsite and content data that is being replicated between the Primary and Secondary servers should not be considered 'high availability' as it depends on a database replication process rather than a real time load balanced and concurrent data process.
See the TEM Administrator's Guide at http://support.bigfix.com/resources.html for more information about DSA.
Multiple servers can provide a higher level of service for your TEM installation. If you choose to add Disaster Server Architecture (DSA) to your TEM installation, you will be able to recover from network and systems failures automatically while continuing to provide local service. To take advantage of this functionality, you will need one or more additional servers with a capability at least equal to your primary server. Because of the extra expense and installation involved, you should carefully think through your needs before committing to DSA.
First, you must decide how you want your TEM Servers to communicate with each other. There are three inter-server authentication options: the first two are flavors of NT and the third is SQL. Because it is more secure, IBM recommends NT Authentication. You can't mix and match; all TEM Servers must use the same authorization. Here are the instructions for each option:
Using NT Authentication with Domain Users/User Groups
When using this technique, each TEM Server uses the specified domain user or a member of the specified user group to access all other TEM Servers in the deployment. To authenticate your TEM Servers using Domain Users/User Groups, follow these steps:
Using NT Authentication with Domain Computer Groups
When using this technique, each TEM Server is added to a specified domain computer group and each server accepts logins from members of that domain group. To authenticate your TEM Servers using Domain Computer Groups, follow these steps:
Using SQL Authentication
When using this technique, each TEM Server is given a login name and password, and is configured to accept the login names and passwords of all other TEM Servers in the deployment. Be aware that the password for this account is stored in clear-text under the HKLM branch of the registry on each TEM Server. To authenticate your TEM Servers using SQL Authentication, follow these steps:
Note: This choice must be made on a deployment-wide basis; you cannot mix domain-authenticated servers with SQL-authenticated servers. Also, all TEM servers in your deployment must be running the same version of SQL Server.
Before proceeding with this section, determine your authentication method and complete the appropriate steps in the preceding Authenticating Additional Servers (DSA) section.
For each additional TEM Server you wish to add to your deployment, make sure they are communicating with each other, and then follow these steps:
The Replication tab of the TEM Admin tool is the only way to properly verify successful replication between Servers. The tool will report important information such as Server, Distance, Expected Latency, Last Replication Time, and Last Error Message each of which can be used to troubleshoot any issues.
If you believe you are experiencing an error you can further troubleshoot by referring to the Filldb.log located by default in the following location: C:\Program Files\BigFix Enterprise\BES Server\FillDBData
Note: Please be patient as initial replication can and will take time depending upon database size and latency between Servers.
If you have a Disaster Server Architecture (DSA) deployment and one of the TEM Root Servers has been removed from the deployment, you can mark it as deleted so it won't show up in TEM Admin. You can use the delete_replication_server procedure stored on the BFEnterprise database to remove a TEM Server. Be careful not to delete the wrong server, or you may lock yourself out. Here's how to proceed:
A. Open the BFEnterprise database with sa rights and open a new query window. Enter the following query, which deletes a server with the name of MyRootServerdeclare @serverid intselect @serverid = (select ServerID from REPLICATION_SERVERS where DNS like '%MyRootServer%' )exec delete_replication_server @serverid
B. Restart the TEM Admin tool to update it with the changes.
Download the BESRemove utility from the BigFix site: http://support.bigfix.com/bes/install/downloadutility.html#besremove
Depending on which authentication method used during installation, (NT Authentication or SQL Authentication), perform the following:
For NT Authentication:
For SQL Authentication:
Restart the TEM FillDB service on the Master server.
In order for the failover process to successfully occur you must set the DSA Server as the Secondary Relay in client settings (manually using __RelayServer2) for the top-level Relays (or via the Console Computer right-click settings user interface). When a failure on the Primary TEM Server occurs and lower level Relays are unable to report they will use the Secondary relay value during normal relay selection process to find and report to the Secondary Server.
Note: The failover process is not immediate and depends on the setting (_BESClient_RelaySelect_ResistFailureIntervalSeconds) set at 10 minutes by default on top level TEM Relay's. A properly configured Relay architecture will allow the entire deployment to begin to fail over in about 10 minutes by default. Deployments that do not have a Relay infrastructure setup can take significantly longer.
If Message Level Encryption is enabled and clients are set using "Task: BES Client Setting: Encrypted Reports" the Server's encryption key should be moved to the Secondary DSA Server. This will allow the DSA Server to process reports from encyrpted clients during normal operations or in the event of an outage on the Primary Server.
If you are using Distributed Server Architecture (DSA) and replication is failing with the error message 'Replication was interrupted to process server database insertions.' in the BES Administration tool, you'll need to raise the maximum amount of time spent doing replication on the TEM Server that is failing.
To increase the maximum replication time, set the following registry key on the TEM Server.
* UnInterruptableReplicationSeconds (DWORD): Seconds
Note: You must restart the FillDB service for changes to take effect.
By raising the value, the TEM Server will spend more time performing replication each time it attempts to do so based on the replication interval. The error is caused because the TEM Server is unable to complete replication using the default value.
For larger deployments of TEM, try a value of 60-120 seconds. If you are installing a new TEM Server, you might raise the value to 300-600 seconds during the initial replication period to reduce the amount of time spent initializing the new TEM Server.