Marketing Program Manager, Spectra Logic
Our society is currently undergoing an explosion in digital data that shows no sign of slowing down. The volume of data created each day has increased immensely and will continue to grow exponentially over time, especially in high performance computing research environments. Where should organizations store their data and what is the best way to manage it? Sure, short-term fixes are appealing due to their easy implementation, but often worsen long-term storage challenges associated with performance, scalability and cost. It is essential to consider future needs when examining storage options.
1. Can Cloud Work in HPC?
With its low up-front costs and ongoing op-ex model of pricing, the cloud seems like an attractive option for many organizations battling the challenge of managing their ever-growing data. But when users look at the true cost of going to the cloud, it becomes clear that the ongoing costs add up over time – making cloud a more expensive solution long-term. Between the cost of getting data to and from the cloud (bandwidth) to the excessive data retrieval charges, the cloud storage model is not one that is sustainable in an HPC environment. So why do we still hear about the cloud? This is usually related to the compute power that is in the cloud and its capability to run data analytics. When used correctly for its processing power, the cloud can be a powerful tool for the High Performance Computing community. However, the cloud can easily become a financial burden when used for massive data storage. It’s important to understand how to use the cloud before jumping in feet first and trying to learn to swim. To learn more about the costs of moving to the cloud, check out this white paper.
2. Multi-tier Storage for cost-effectiveness
A multi-tier storage strategy is an important concept to any data storage environment, but an effective tiered storage implementation is becoming a requirement in the HPC market. Not all data created today is of critical importance, but that is not to say that the information has no value; therefore, HPC environments need to find a way to affordably store their data long-term. Universities, for example, must keep all research, findings, notes, and general data for a minimum of seven years after completion of a federally-funded project. This is a requirement of anyone who accepts National Science Foundation (NSF) grants for research. It’s a huge expense and, often times, the data, while never accessed again, must still be kept.
This is where the importance of a tiered storage strategy is vital to the sustainability of many HPC environments. Without a tiered strategy to move inactive data to a lower cost storage medium, these organizations would be stuck with huge storage bills for data that is considered “cold” and rarely, if ever, accessed again. Finding the correct balance between speed of access (or need to access) and the cost of storage has never been so important for these organizations.
3 Active Archive for access
An active archive is a proven solution for the HPC market that has been around in the data storage industry for nearly 10 years. This automated tiered storage approach balances speed of access with cost of storage by offloading expensive primary storage while still keeping all of an organization’s data online. As organizations face the problem of storing more data with less money, an active archive solution enables them to store massive amounts of data at an affordable price, without sacrificing online access to their assets. To learn more about active archives, read the “Active Archive and the State of the Industry 2018” report by the Active Archive Alliance, a collaborative industry alliance of leading providers of tiered storage technologies, including file systems, HSM, applications, cloud storage, high-density tape and disk storage.