Forever Data Retention Is NOW

Reading time for this article .
Subscribe to the Spectra Blog >

Data growth and data retention are both on the rise. At Spectra Logic we’ve been seeing this for years. In a recent webinar with Storage Switzerland, “Driving Down The Cost of Forever – How To Keep Data For A Long Time” we saw this fact confirmed again. When a polling question asked how long attendees needed to keep data, 69% of the respondents indicated 10 years or more.

Forever Data Chart

But this need to retain information has grown beyond simple compliance and regulations. The primary motivation is now money. There’s value in the data we are storing and we want to keep that data so it can be ‘monetized’ later. The problem is that we don’t know which data will be valuable – or when – so we need to keep just about all of it. How can we afford to do this?

Scale Out Disk?

Various disk systems have emerged recently that claim to provide this ability of highly scalable capacity with performance. And from a technology standpoint they may be able to do so. These systems leverage a scale out architecture enabling capacity to be continuously added while performance stays relatively consistent. An increasing number of these scale out systems are now being built around an object-based file system with the ability to hold millions if not billions of files.

The Problem with Scale Out Disk

The problem with scale out disk is that even if you got the disk drives for free you couldn’t afford to keep the data on them for 10 years or more. Scale out systems scale by using a cluster of storage nodes, storage that must all be powered on all the time. Spin-down drives could be used in the nodes, but erasure coding and long term data integrity methods would keep those drives busy enough to minimize the amount of times that the cluster could move to that lower power state. Even if these systems could somehow be powered down, the surge to power everything back up when a data set needed to be recalled could be disastrous.

In addition to the disk array power consumption, the data center floor space and the cooling of these systems make the cost of a decade or longer data retention untenable. In other words, with scale out storage it’s not the cost of the TB of storage upfront, it’s the cost to power and cool that TB month after month and year after year.

What about tape as a long term storage architecture? It avoids the power and cooling problems of scale out storage by almost eliminating power consumption. Of course, tape has its own challenges. For example, slow access to that first byte of data has led many to disregard tape in favor of scale out disk. But is that the right approach? We don’t think so.

First, there are a great many workloads where access can be predicted. These are sequential workloads were things happen in a predetermined order. If tape could be programmatically set to pull this data right before it is needed, the access problem could be essentially eliminated. Second, tape can be buffered with scale out disk to create a tiering effect. The user would have to understand that they can get to all of their data, but just have to wait for the really old stuff. Finally, there may be a demand for a low cost alternative, where users are willing to wait for tape to access data before it is ready. Amazon Glacier is an excellent example of this.

Conclusion

Storing data forever actually requires both Scale Out Disk and Programmable Tape. The combination allows for each technology to be used when it makes the most sense. It also makes the long term retention of data a cost-effective reality. The programmable part of tape is something we are seeing come to fruition with Spectra’s DeepStorage initiative that we discussed on Storage Switzerland’s webinar. This technology allows users and applications to interact with tape via an Amazon S3 compatible RESTful API.

Storage Switzerland’s Lead Analyst George Crump and Spectra Logic’s Chief Marketing Officer, Molly Rector discuss the forever data problem as well as the role of tape in solving that problem in our webinar “Driving Down The Cost of Forever – How To Keep Data For A Long Time”, now available on demand.