Data Reduction in Primary Storage: How to Improve Performance and Lower Costs with StorCycle

Reading time for this article .

More than 80 percent of data is being stored on the wrong tier of storage, costing organizations millions of dollars a year. The cost benefit of managing data to different storage targets based on usage is generally well accepted in the data storage industry, and modern storage lifecycle management software can be used to reduce the overall cost of storing data by up to 70 percent. However, organizations can be hesitant to put their data on these lower-cost storage locations because it may be difficult to identify and move the inactive data.

When users are not sure what data has gone cold on primary storage, it is difficult to decide which data to move to what storage tier.To make informed decisions as to what data they would like to offload from primary storage, users need knowledge of the data they already have. Spectra Logic’s storage lifecycle management software solution, StorCycle, solves these challenges with its ability to identify inactive data based on user-defined policies.

Data Visibility for Better Storage Decisions

File age is the most important factor for determining which files should be migrated, because the older the file, the less likely it is to be accessed. With StorCycle, file age can be determined by the date the file was last accessed, last modified or created.

File size is also important for determining which files to migrate. Identifying and moving large files has a significant impact on freeing up space and improving performance when compared to moving small files. Furthermore, small files transfer to secondary storage at a slower rate (MB/s) than large files. Small files also have a larger impact on the count of files/objects in cloud and object storage, which may incur per-file charges or be subject to an object count maximum.

StorCycle helps organizations continuously understand how much data they have on primary storage, how old it is, how large it is and where it is located to optimize storage decisions. The storage lifecycle management software solution will scan configured primary file systems, or specific directories, for all files – collecting, aggregating, analyzing and reporting important information about the data.

Performance-Driven Scanning Capabilities

There are two main considerations when developing a successful file scanner in a data management solution: the rate at which information about the files being scanned can be collected, and the rate at which that collected information can be recorded. Because reasonable scan speed is critical, StorCycle uses server memory to the greatest extent possible to optimize scan operations. Scan data is collected in memory and then written to the StorCycle database in batches for increased write performance. StorCycle also uses hundreds of parallel programming threads to collect scan data, enabling a scan rate on the order of thousands of files per second (or hundreds of millions of files per day) depending on the storage, server and network environment.

To ensure that the StorCycle does not overuse the resources of the scanned storage system, scan operations can be performed immediately on a per-request basis, scheduled for a specific date and time in the future or set to run on a recurring basis. Setting a later start time for scans allows the operation to run at a desirable time, such as outside of business hours. Job throttling is also available for NAS storage locations, meaning users can opt to set a maximum scan rate (in scan objects per second) to limit transfers during designated peak hours. Finally, users have the choice to use the most recently completed scan rather than performing a new scan when migrating data.

Project-Based Archiving

A reduction in primary storage will move infrequently accessed data to lower cost storage, creating a more organized, mission-critical primary storage infrastructure. But migrating data off of the Primary Tier of storage is not limited to older data. It is also an option for large data sets which may be moved immediately after creation or collection, such as machine-generated data, completed experiment outputs and videos.

With StorCycle’s unique Project Archive feature, users can tag and move entire project data sets. This allows all project files and subdirectories to be migrated and accessed for further analysis, categorization, and comparison, all while being securely preserved for as long as needed. Supplemental metadata tags can also be added so that the project can later be easily retrieved via StorCycle’s search tool.

Sink or Swim

Those who don’t implement storage lifecycle management software solutions often find themselves enforcing storage quotas or dealing with a massive sprawl of expensive primary storage systems. By identifying and removing inactive data from the Primary Tier of storage, administrators can reduce primary storage costs in hardware software and storage licensing. A reduction in expensive primary storage will also lead to less administrative and maintenance costs for the primary storage support, and allow IT administrators to be more productive in organizing and managing their infrastructure. As the size of an organization’s primary storage is reduced, backups or replication snapshots are also smaller, leading to shorter backup windows and reduced backup storage costs. Leaving only the mission critical and important data on primary storage systems will lead to higher performance of active data, streamlining primary data access.

Spectra StorCycle provides the data visibility that organizations need to make the right decisions in managing their data storage. Learn more about the storage lifecycle management software in the Technical Guide, here.