10.6 Running epoch_watchdog

If a VOB replica is restored improperly from backup, divergence can occur in the VOB family. When you restore a replica from backup, its epoch row is rolled back. If you do not run the restorereplica command on the replica before resuming development in the replica, divergence can occur.

For example, oplogs 1-700 are created in a replica and exported to sibling replicas. The replica is then restored from backup and its epoch number becomes 600 (operations 601-700 occurred after the backup copy was created). If the administrator does not run the restorereplica command, development resumes and new oplogs are created starting with ID 601. These oplogs have the same ID as the oplogs that were exported to other replicas before the restoration, but the operations themselves are different. The restored replica has diverged from the other replicas.

The epoch_watchdog script checks whether a VOB replica's epoch numbers have rolled back without a restorereplica command being run. We recommend that you run this script regularly as a scheduled job on all replica server hosts. For example, the following job runs epoch_watchdog every three hours for all VOBs on the host:

Job.Begin
Job.Id: 20
Job.Name: "epoch_watchdog"
Job.Description.Begin:
Run epoch_watchdog for each replicated VOB on this host.
Job.Description.End:
Job.Schedule.Daily.Frequency: 1
Job.Schedule.StartDate: 3-Sep-2001
Job.Schedule.FirstStartTime: 20:00:00
Job.Schedule.StartTimeRestartFrequency: 03:00:00
Job.DeleteWhenCompleted: FALSE
Job.Task: 105
Job.Args: -all
Job.NotifyInfo.OnEvents: JobEndOKWithMsgs,JobEndFail
Job.NotifyInfo.Using: email
Job.NotifyInfo.Recipients: ms_admin
Job.End

This job uses the MultiSite Epoch Watchdog task, which is defined as follows:

UNIX task:

Task.Begin
Task.Id: 105
Task.Name: "MultiSite Epoch Watchdog"
Task.Pathname: epoch_watchdog
Task.End

Windows task:

Task.Begin
Task.Id: 105
Task.Name: "MultiSite Epoch Watchdog"
Task.Pathname: epoch_watchdog.bat
Task.End

For more information about creating tasks and scheduling jobs, see the schedule reference page in Command Reference and the Administrator's Guide for Rational ClearCase.