Primary Disk Deduplication’s Impact on Backup

George Crump posed an interesting question when he asked if primary storage deduplication will kill archive and backup.  It is a great question, and one we should explore.  If you don’t want to read my ramblings, my short answer is no. 

There is a lot more to archive and backup than simply storing a lot of data, something deduplication has proven it can do well.  Backed up and archived data needs to be cataloged, indexes and managed through its life or retention.  That’s one of the reasons we don’t use tar and dump commands much these days.  Snapshots can remove much of the recovery burden from alternate storage devices.  I have seen customers recover almost all single file restores from snapshots.  But they never served as a replacement for backups.   As George said, we sleep better at night when copies of our data are on different systems.  There are lots of reasons for that.  We all worry about a bad firmware load.  If you have all your data on one array (or replicated to an identical one) a bad firmware release could wipe you out.  And of course there are physical failures. No matter how well designed a system is, something external can happen.  In the years I was in the field, I heard some unbelievable external failure stories where an non-IT event started the failure.   (Maybe I should start collecting them).

This leads me to conclude that proper architecture of a data storage environment includes dissimilar storage devices.  Your backup and DR copies need to be independent from production data, to prevent a cascading failure getting every copy.  For archive, the first copy could be on the primary storage platform, but the redundant copies (and all good archive systems maintain a minimum of 2 copies of the data) need the same.  It could be as easy as Spectra nTier disk and Spectra TSeries tape, or it could be more complex.  What it won't be is a single disk array for primary storage, archive and data protection.