StorCycle  Govt Agency Data Management Software (DMS) Image

Overview:

Virtually any governmental agency, in any country across the globe, deals with large amounts of data. This particular agency creates, collects and distributes scientific information used by other governmental offices, both U.S. and international; non-governmental agencies; other researchers; and individual citizens. Their sources of data include application output, field sensors, machines, cameras, individuals and other methods. Once data is gathered, the organization will further analyze, categorize or simply store it for possible future use.

Data Intensive Project agency image

“We gather enormous amounts of data on an ongoing basis, and none of it can be discarded. With StorCycle, we can manage project data more effectively, based on our users’ needs. The software allows us to move large data sets off of primary storage immediately after ingest, protecting it indefinitely in a lower cost tier while maintaining the right level of access throughout its lifecycle.”

– System administrator at governmental research agency


Problem:

As technology and science evolve, new exploration often draws on historical data – be it weather patterns, ocean currents, agricultural yields, mineral exploration. For this reason, the agency needed a strategy to permanently protect their data. In addition, much of the data they collect is machine generated. Examples include data from sensors which may detect physical phenomena and turn it into a data stream, or calculations from algorithms predicting risk of earth movement based on other seismic data sets.

In such cases, a researcher may deem that the output is not necessary for a current project, and rather than analyze it right away, choose to keep it for future reference. Most of the machine-generated data they receive requires high-speed disk as a landing zone. Researchers had no way to move data to lower cost storage and bring it back when needed. At great expense, this data remained on the Primary Tier of storage even if it was never accessed.

Solution

After extensive search for a data management software application, this particular agency implemented a combination of StorCycle, Spectra’s BlackPearl® Converged Storage System, and a Spectra T950 Tape Library. StorCycle’s Project Archive feature gives it the ability to identify, collect and archive large data sets based on the data’s association with a given project, making it easy to manage multiple forms of data from many sources. By setting up an Archive Directory with StorCycle, even machine data can be immediately archived as it comes in – using high-speed disk storage for ingest, but automatically moving data to a lower performance tier after intake and deleting it from the primary storage.

The Archive Directory can be associated with the project that created the data for seamless tracking, and researchers can designate its appropriate storage layer. Furthermore, the ability to query the StorCycle database means that individuals not originally associated with the research can find data throughout its lifecycle.

As the output storage target for StorCycle, a flash-based BlackPearl Converged Storage System can not only ingest the archived data at great speed, but it can also direct it to the tape library at great speed – easily streaming 12 or more LTO tape drives simultaneously. The final archive tier will be the Spectra T950, which can hold over 11PB of uncompressed data in the footprint of a single rack and expand to hold over 120PB of uncompressed data via expansion frames, effectively extending the high-speed primary storage tier indefinitely at pennies per gigabyte. StorCycle additionally directs other data sets within the agency to the cloud for distribution or sharing. To control cloud costs, the organization’s cloud copy is set to expire in a year or two, while their disaster recovery copy remains on tape in perpetuity, protected from malware by tape’s physical air-gap.

Why StorCycle:

  • Project Archive feature
  • Simple, two-tiered storage paradigm
  • “Forever” retention in an object-based Perpetual Tier that is accessible and easily searchable by project name or other tagged metadata
  • Large data sets may be moved immediately after creation or collection

X
Spectra Logic
Follow Us