Texada Software SaaS Disaster Recovery Plan

Describes Texada Software's SaaS (Software as a Service) disaster recovery plan (DRP).

Introduction

The following document describes Texada’s Backup and Disaster Recovery Plan. This is meant as a high level overview of the steps that Texada takes in order to ensure the security protection of customer data within our Saas applications.

SRM Server

Description

These servers host Saas customers SRM related data and are of the utmost priority to get up and running. This includes SRM application and Oracle database which is always in ARCHIVELOG mode enabling us to do its restore and recovery to any time.

Backup Types

Each Saas server is backed up in the following ways:

On-disk Oracle backups – each night we perform a hot backup of the Oracle Production database. This backup is initially stored on disk. Once completed, this is also backed to Rogers’ cloud backup solution within the same data centre.

These backups are tested with monthly fire drills on the first Wednesday of every month where the data is pulled back from the Rogers backup and validated by a DBA.
Restoration times are about 2 hours on average.
Attachments – Attachments as defined within SRM are backed up nightly to Amazon S3 cloud storage. Backups are retained in S3 for 30 days, after which they are moved to Amazon Glacier storage. This means that files that are older than 30 days might take longer to restore if needed.

The attachment directory is also backed up nightly to by Rogers in the same DC. A complete restoration of the attach directory takes around 5 hours. Restoration of individual files is significantly faster.
Other Data and Application and Operating System Files - these are backed up on a nightly basis by Rogers to their cloud backup solution in the same data centre.
A complete restoration of company application files have been fire drilled and takes approximately 5 hours.

Failure/Restore Types

Hard Drive Failure

The disks that house the database and the application are installed in RAID5 and are hot swappable.
While the mirror is being rebuilt, customers might notice that the server’s response time is slower than usual, but impact should be minimal.
This has been validated on servers in house and the rebuild window for the application array is approximately 2 hours and the rebuild window for the Database array is approximately 1 hour.
The /attach directory is housed on a single 2TB disk with no redundancy. If that drive fails, the disk will be replaced, and the data will be restored from backup.
Once the disk is replaced and the directory structure rebuild, the attachment directory will be usable for new attachments.
Prior files housed in the attachments directory will not be available until the backup restoration is complete. This has been validated and it will take approximately 5 hours.

Corrupt Files

The level of corruption will be determined in conjunction with the customer and our development team.
Once the level of corruption is determined, the number of files we need to restore will be determined and the fastest restore option will be implemented. This could mean we restore from the most recent backup on disk (as this is the fastest) or if we need an older version, we might need to pull the backup from cloud storage; in which case the restore might be more time intensive.
The length of the outage and the number of customers affected depends on what has become corrupt.

Complete Server Failure

In the event of a complete server failure, we will need to engage with the data centre and Rogers support to have the hardware replaced.
Once the hardware has been supplied by Rogers, our team will need to restore Oracle, SRM and all data files.
The servers are covered under a Dell same day service warranty and should be back up within 48 hours maximum.
With the current disaster recovery strategy, the restoration of files and applications will take up to 24 hours to complete.