Rethinking Storage with Big Data

Reading time for this article .

The annual Spectra sales kick off meeting just wrapped up.  I am working on getting this entry done before I leave for a week’s vacation.  It was an interesting few days of discussions and presentations.  There were a lot of outside speakers this year, really adding to the variety of topics and viewpoints. 

I ended up on the schedule three times this year.  I would talk to my agent about  when they were scheduled, if only I had one.  One of my sessions was an update on the Big Data market.  As I got up to start the presentation, I was surprised at how well it tied into everything we heard already.  It seems most market segments and verticals are facing a Big Data challenge of one type or another.  

Most of the external conversations about Big Data seem to immediately go to analytics.  I think it is fascinating how we can derive and learn so much from data we already have.  As we learn more about getting value from our data, we want even more data to analyze.  This makes me somewhat surprised that little of the conversations really focus on storage problems of Big Data.

I talked with the sales team with week about how Big Data is changing the storage rules.  There are a lot of things that work on a 10 TB data set that are not practical with a 1 PB data set.   As you amass hundreds of Terabytes of data, and start heading toward a Petabyte, you need to look at the basic questions again:  How do I store it?  How do I protect it?  How do I move it?  The implications of these questions when viewed at the Petebyte level are interesting. 

If the data does not change much, moving these big data sets into an active archive can help.  In an archive, disk and tape both serve as primary storage systems.  What a lot of people don't initially consider is that tape might be the best primary storage platform for these data sets.  Nothing beats the TCO of tape, making it the most affordable storage platform for these large data sets.  Bandwidth isn't just expensive, there simply isn’t enough of it to replicate Petabyte data sets.  Tape's native portability makes it possible to move data at massive scales. Static data written to two different tapes does not need traditional weekly backups.  This might just solve a few problems. 

As a guy with years of experience with disk, it is interesting to look at the basic questions again in the era of Big Data.  I think tape has a good fit in some of the areas while disk fits others, and look forward to exploring it more in the future.  But not anymore today.  I am off to ride my bike across Iowa.