Professional Search Engine Optimization and SEO Marketing Firm The fastest way to tap into local search engine traffic with instant online exposure!
 Home   Package Deals   Optimization   Submission   Pay-Per-Click   Paid Inclusion   Local Search   Conversion   Free Tools   Blog  Search Engine and Online Marketing Blog
  
Welcome to the ineedhits Search Engine Marketing blog, where we share the latest search engine and online marketing news, releases, industry trends and great DIY tips and advice.

We encourage you to get involved in our blog community - so share your opinions and experiences by leaving comments on our posts.

If you're looking for help with promoting your website - check out our range of affordable search engine marketing services.


Tuesday, June 20

Disaster Recovery Planning - When you should be Planning to Fail

In one of my lectures at university, the lecturer said something that has all stayed with me, in regards to strategic planning. That was:

"Many business fail to plan but few business plan to fail".

Well, I am saying that you should have a plan for when things fail - your Disaster Recovery Plan. DRP is a process that any company who has production IT systems should have in place.

DRP is not just about having a back up plan for data. It goes further to address the issue of what you do with that data that you have backed up and what hardware do you put it on? It takes into account the likely costs and associated business issues that comes with downtime. It is not an IT document but rather a business document.

Let me provide a real life example, which highlights the value of having a plan in place and how "Murphy's Law" can apply at times:

On Tuesday 13th June 2006, the ineedhits.com website suffered a number of hours of downtime whilst we implemented our own disaster recovery plan, at an individual server level. Whilst the overall circumstances are quite complex, it is best under stood by looking at the timeline below:

4pm Friday 9th June: ineedhits' production SQL server reported failure on a RAID 5 drive. For those people with an IT background, you will know that RAID 5 offers a level of redundancy that allows for one disk in the array to fail without any issue.
A new disk was ordered and under our agreement with our hardware manufacturer, would be delivered next business day.
Unfortunately, next business was Tuesday 13th June 2006 due to a localized public holiday where our data centre is located which is in a different state than ineedhits' head office.

2pm Monday 12th June: the same server reported a high probability of failure to one the "mirrored" drives within this machine. A call was logged raising the urgency of the replacement drive(s), however the public holiday again slowed progress.

12noon Tuesday 13th June: A second drive in the array reported failure. The machine stopped responding. The maintenance banner was placed on the site whilst we examined our options.
Our plan called for a full copy of the database to be copied down to our alternative data centre via a secure VPN Tunnel. Even with a high speed link, this took multiple hours to achieve, finishing in the very early hours of the morning. In the meantime, we double checked the security and patch levels on our back up SQL server and bought them up to date.

Wednesday 14th June 2006: The restore was completed on Wednesday morning and site connectivity restored.
The first hardware technician replaced one of the failed drives in the array. Unfortunately this person was a Tier 1 level support person and did not have a great deal of experience or knowledge.

Thursday 15th June 2006: A more experienced Tier 2 support engineer arrived and replaced the SCSI backplane, as well as the failed mirror drive. He used his initiative and bought the backplane as two dries failing in a server less than 4 months old (from a name brand vendor) is highly unusual.
A rebuild of the array was commenced.

Friday 16th June 2006: The rebuild of the array completed but showed corruption of the data on the drive.
The decision was made to rely on backups and continue running on our alternative data centre until the main production server reliability could be assured.
As such, a 60 hour long "stress test" was applied to this server over the weekend.

Monday 19th June 2006: With confidence restored in the server after passing the stress test, the entire process completed on Tuesday 13th June and finished on Wednesday 14th June had to be reversed.

Tuesday 20th June: All systems appear to be up and running. However, if you are experiencing an issue, I strongly urge you to contact the ineedhits' customer care team and they will gladly assist.

I would like to stress that ineedhits' data has not been compromised by an external party. All data remains in a secure encrypted state. Thanks to having our plan in place, we were able toproceede with an acceptable downtime and with minimal disruptions. It is always highlyregrettablee when our site is unavailable.

For that - and perhaps most importantly - I'd like to apologize to all our customers and sites visitors for the inconvenience that this downtime may have caused. If you have any questions about orders you placed between Sunday 11th June and Wednesday 15th June 2006, please contact our customer care team!

Some hints / lessons learnt with DRP:

  1. Do not assume that you are immune from a failure. Plan to fail - at least in an IT sense.
  2. Check the definition of what is a "business day". Generally these refer to the business day where your data centre is located and not the region where you sign your contract.
  3. Check your maintenance contacts extremely carefully. Generally they will have phrases such as "commercially best efforts" in there, with regards to replacement parts, unless you have paid extra.
  4. A maintenance contract is important - however it is only as good as the person responding to the call out. It is a matter of luck as to the level of experience, knowledge and customer focus that the person who responds to your call will have. Be thankful when you get a good person and do the right thing by them and acknowledge them to their management. If you get a person who is not at the expectation level, then this feedback also needs to be provided in a calm and rationale way.
  5. If you have a database, make sure you know HOW this data is being backed up. It is a point in time back up or in real time. Point in time means that a "snapshot" of the database is taken at that point in time (generally once a day). Should a failure occur, you roll back to that point in time with a window of lost data. Real time means that every transaction to the database is backed up as it happens.
  6. Practice your DRP! I strongly suggest you practice your DRP to make sure the plan is feasible.
  7. Communicate as effectively as you can to all parties who have vested interests during downtime. This helps ensure expectations are being set correctly.
  8. Be realistic - it is no point aiming for 10 minutes worth of downtime if it takes 30 minutes for your back up server to be turned on, mounted the logical drives and ready to start taking orders. See point 6.
  9. Identify which systems have a dependency on others and use this to identify single points of failure. i.e. what is your firewall goes down? Does this mean your email, which is also your fax server stops working, which is your primary way of accepting orders?
  10. Finally, if you have to put your DRP into place, determine how effective it was.

Do not fall into the trap of thinking that it won't happen to you or that DRP is not for small businesses. It is!

If you plan to fail you will also plan to get back up and running!


Posted by Warren Duff at 6:03 AM GMT

Furl this! Disaster Recovery Planning - When you should be Planning to Fail   Googalize this post! Disaster Recovery Planning - When you should be Planning to Fail   Add to Netscape! Disaster Recovery Planning - When you should be Planning to Fail   Add to reddit - Disaster Recovery Planning - When you should be Planning to Fail   Shadow this! Disaster Recovery Planning - When you should be Planning to Fail   Spurl this! Disaster Recovery Planning - When you should be Planning to Fail   Tag to wink! Disaster Recovery Planning - When you should be Planning to Fail   Add to MyWeb - Disaster Recovery Planning - When you should be Planning to Fail   Add to BlinkList - Disaster Recovery Planning - When you should be Planning to Fail   Add to de-licio-usDisaster Recovery Planning - When you should be Planning to Fail   Digg It! Disaster Recovery Planning - When you should be Planning to Fail


2 Comments
At 12:37 PM, Anonymous Robin said...

I did face the problem of disk crash and i never thought it would be so difficult to get to the solution and after much efforts i sent it to Disk Doctors Labs Inc where my Disk Was recovered

 
At 2:28 AM, Blogger Warren Duff said...

Hi Robin,

We looked at that as an option but the cost of doing that was extremely high. Each of the disks in the array would have had to be provided to the data receovery expert who then charge "per meg" of data recovered.

With costs of IDE hard drives coming down, I strongly recommended people use mirror drives in their machines or even back up to USB Thumb drives. Simple, cheap and very simple.

Warren

 
 

Post a Comment

Search Engine Marketing Blog Homepage 
Our Newsletter 
Sign up for industry news, useful tips & special offers.

Subscribe! 
Feedburner    Atom
Site Feed
 
Free Tools
 

www.gooruze.com

Archives 
Recent Posts:
What Does Gate's Retirement Mean for Microsoft's O...
New Microsoft Instant Messenger Goes "Live"
Yahoo! Claims Top Spot in Online Properties
Google Launches New US Government Search Site
Netscape Reborn: New Social News Service
eBay Releases Great New Tools for Members
eBay's New Contextual Advertising System
The Internet: A Top Contender in the Mass Media Ma...
Google Spreadsheets: Competition for Microsoft Exc...
Yahoo! Release: New Internet Browser
Recent Posts:
Search Engine Marketing Blog Homepage
Contributors 
Warren Duff
Rene LeMerle
Jackie Shervington
Recommended 
7 Day Google Inclusion
Get listed in Google in 7 days - guaranteed! Only $24.99/mth
Buy Now More ...

Guaranteed Top 10 Listing
Guaranteed top 10 listing in 250+ search engines - Just $9.99/month
Buy Now More ...

Google & Yahoo! Advertising
Instant exposure and quality website traffic from Google & Yahoo! sponsored advertising. Increase your traffic from only $99 per month.
Buy Now More ...

Archives:
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
July 2006
August 2006
September 2006
October 2006
November 2006
December 2006
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007
August 2007
September 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
May 2008
June 2008
July 2008


Promote your website for as little as 0.8 cents per banner display.



Home  |  Package Deals  |  Optimization  |  Submission  |  Pay-Per-Click  |  Paid Inclusion  |  Local Search  |  Conversion  |  Free Tools  |  Help
Shopping Cart  |  Account Login  |  FAQ  |  Contact  |  Privacy  |  Security  |  Terms & Conditions  |  About  |  Reseller Programs  |  Site Map
Member of SEMPO (Search Engine Marketing Professional Organisation)
ineedhits, ineedhits.com, and their designs, logos, and
related marks are trademarks of Ineedhits.com Pty Ltd.
© Copyright 1999 - 2008 ineedhits.com