The next storage architecture: the two-tier paradigm
The traditional file-based storage interface is well suited to in-progress work but breaks down at web scale. Object storage, on the other hand, is built for scale. Rather than attempting to force all storage into a single model, increasing scale, level of collaboration and diversity of workflows are driving users toward a new model for data storage – a sensible combination of both.
In its 2021 Data Storage Outlook report, Spectra proposes a new two-tier architecture focused on the usage of the data rather than the technology. The two-tier paradigm combines a Primary or Project Tier where in-progress data resides, which is file-based, and a second or Perpetual Tier where finished and less frequently changed data resides, which is object-based. Data moves seamlessly between the two tiers as data is manipulated, analyzed, shared and protected.
The Project Tier contains:
- Data ingest, where raw data streams need to be captured rapidly.
- Work-in-progress, where a user may hop around and edit data in any location and the application must respond instantly to user input.
- Computation scratch space, where the volume of data exceeds RAM and/or checkpoints are saved to stable, high-bandwidth storage. Most of it will be discarded after the job is complete; only the results will live on.
The Perpetual Tier houses:
- Project assets that must be shared across a team so they can be the basis for future work.
- Completed work that must be distributed.
- Finished computational results to be shared across researchers. Encryption and access controls, such as those provided in the S3/HTTP protocol, allow for sharing of sensitive data across the public internet.
Given that a singular storage technology has yet to be invented that combines the highest performance at the lowest cost, customers will continue to face the dilemma of what data should be stored on which medium at what time. Customers’ varying requirements will necessitate different types of data movers in order to move data between the two tiers. Some customers may want the ability to move large amounts of project data over to the Perpetual Tier once a project is completed, to free up the Project Tier for new projects and make project data available for future processing. Another customer may use the Perpetual Tier as a means to distribute data globally to multiple groups working on the same project. A data mover allows users to “check out” a project by moving the data from the Perpetual Tier to a local Project Tier. Modern data movers and software tools that allow customers to identify the usage patterns of their data and then provide for the movement of infrequently accessed data to lower tiers of storage will improve data storage efficiencies while mitigating storage costs.