
Overview:
Virtually any governmental agency, in any country across the globe, deals with large amounts of data. This particular agency creates, collects and distributes scientific information used by other governmental offices, both U.S. and international; non-governmental agencies; other researchers; and individual citizens. Their sources of data include application output, field sensors, machines, cameras, individuals and other methods. Once data is gathered, the organization will further analyze, categorize or simply store it for possible future use.

“We gather enormous amounts of data on an ongoing basis, and none of it can be discarded. With StorCycle, we can manage project data more effectively, based on our users’ needs. The software allows us to move large data sets off of primary storage immediately after ingest, protecting it indefinitely in a lower cost tier while maintaining the right level of access throughout its lifecycle.”
– System administrator at governmental research agency

Problem:
As technology and science evolve, new exploration often draws on historical data – be it weather patterns, ocean currents, agricultural yields, mineral exploration. For this reason, the agency needed a strategy to permanently protect their data. In addition, much of the data they collect is machine generated. Examples include data from sensors which may detect physical phenomena and turn it into a data stream, or calculations from algorithms predicting risk of earth movement based on other seismic data sets.
Solution
After extensive search for a data management software application, this particular agency implemented a combination of StorCycle, Spectra’s BlackPearl® Converged Storage System, and a Spectra T950 Tape Library. StorCycle’s Project Archive feature gives it the ability to identify, collect and archive large data sets based on the data’s association with a given project, making it easy to manage multiple forms of data from many sources. By setting up an Archive Directory with StorCycle, even machine data can be immediately archived as it comes in – using high-speed disk storage for ingest, but automatically moving data to a lower performance tier after intake and deleting it from the primary storage.

As the output storage target for StorCycle, a flash-based BlackPearl Converged Storage System can not only ingest the archived data at great speed, but it can also direct it to the tape library at great speed – easily streaming 12 or more LTO tape drives simultaneously. The final archive tier will be the Spectra T950, which can hold over 11PB of uncompressed data in the footprint of a single rack and expand to hold over 120PB of uncompressed data via expansion frames, effectively extending the high-speed primary storage tier indefinitely at pennies per gigabyte. StorCycle additionally directs other data sets within the agency to the cloud for distribution or sharing. To control cloud costs, the organization’s cloud copy is set to expire in a year or two, while their disaster recovery copy remains on tape in perpetuity, protected from malware by tape’s physical air-gap.
- Project Archive feature
- Simple, two-tiered storage paradigm
- “Forever” retention in an object-based Perpetual Tier that is accessible and easily searchable by project name or other tagged metadata
- Large data sets may be moved immediately after creation or collection