The Auto Migrate method migrates inactive files according to user-defined policies based on file age and size. These files are moved to the selected storage target(s) within the Perpetual Tier. The ability of StorCycle to easily identify and automatically migrate data off of primary storage to perpetual storage is invaluable – essentially extending production storage to provide limitless capacity at a fraction of the cost.
The Project Archive method is used to migrate data sets associated with a specific project with a specific directory. It is typically a manually initiated process allowing users to immediately migrate data rather than basing the migration on the age of the data. Project Archive is useful for data such as machine-generated data, completed experiment output, videos, etc. When a project is completed, StorCycle can migrate it from the Primary Tier to the Perpetual Tier via the project archive function.
Auto Migrate with Scanning
A migration job moves data from the Primary Tier to the Perpetual Tier. When configuring a new data migration job, if a scan has been performed previously, StorCycle graphically displays all scanned data within the selected storage location by file sizes and ages (last access times). This allows users to easily select which data should be moved from primary storage to a lower-cost storage medium, based on the size and time since last access. Users can choose to move all scanned files or only a portion based on file size or file age.
Often the file age is the most important property for determining which files should be migrated. The older the file, the less likely it is to be accessed. File age is typically determined by the date the file was last accessed, but StorCycle also allows file age to be based on the last modified date or creation date of the file.
- Migration of smaller files from primary storage does not have as big of an impact on freeing space and improving performance when compared to moving large files.
- Small files transfer at a slower rate (MB/s) than large files
- Small files have a larger impact on the count of files/objects in cloud and object storage. Cloud services typically have per-file charges, and object storage systems often have an object count maximum.
When migrating data, users have 2 options for the scan they’d like to use:
- Use Last Scan – Migrate files that meet the specified criteria for age and size based on the most recent scan result performed. This is the most efficient use of resources, and is recommended whenever possible.
- Scan before Migrate –Migrate files that meet the user-specified criteria (age and size) based on a new scan. This option would be used if a prior scan result is not available or if the previous scan result is old.
The Perpetual Tier of storage is not limited to older data. The perpetual storage tier should also serve as an archive tier for large data sets which may be moved immediately after creation or collection.
Universities, Government Agencies, Genomics, Research Labs – These organizations create enormous amounts of data on an ongoing basis that needs to be managed outside the confines of high-performance, Tier-1 storage. Users have no way to “group” various data sets or track them once moved from primary storage, so they sit on high-performance, high-cost storage indefinitely.
Likewise, IT departments have data sets which could be moved to lower cost storage tiers immediately after completion such as year-end financials, corporate videos, marketing collateral, and email archives just to name a few.
StorCycle can be used to migrate any files/folders/directories associated with a given project. The migration job will create a manifest file which shows data migrated, where it was migrated from and where it was migrated to. Simply click on the archive job to display the manifest. The archive can also be tagged with searchable information relating to the project. This allows for simple search and restore even years or decades into the future.
File Replacement Options
In order to locate the migrated or copied data, StorCycle introduces the concept of “replacement options.” This will determine how data is accessed in the future. StorCycle supports four user selectable replacement options for the managed data:
- Replace the file with an HTML Link file in place of the source file, which can be used to restore the file(s) by users. More details about this option are below.
- Replace the file with a symbolic link, which points to the file on NAS target storage (available in a future release).
- Move the scanned data and delete it from the primary source, leaving no replacement file or link behind. Files can be found and restored via the search tool in the StorCycle web management interface.
- Do not remove the original file from the primary source. Keep a copy of the source data on primary and make a copy on secondary storage locations.
The destination for migrated data is also user definable. Multiple storage targets may be assigned to each job, allowing for multiple copies of data to be created anywhere within the perpetual storage tier – cloud, NAS, object storage disk, tape, a replicated site, or any combination thereof. Users may also assign metadata tags for easier search and retrieval in the future.
“Packing” Files to Reduce Object Count and Improve Performance
When migrating files to a Spectra BlackPearl Converged Storage System, StorCycle offers a “pack files” option. By enabling this option, files sent to BlackPearl are aggregated into a single ZIP or TAR file. Packing reduces the total count of objects sent to BlackPearl (currently limited to 1 billion objects). These larger, packed files offer much greater performance, both when being transferred to the BlackPearl cache as well as when being written from BlackPearl cache to the final storage targets. StorCycle uses a default pack size of 10GB. Once the pack size meets or exceeds 10GB, the pack will be considered “full” and no further data will be written to it. For a data set with an average file size of 1MB, enabling file packing would reduce the count of objects sent to BlackPearl by roughly four orders of magnitude while increasing performance by roughly one order of magnitude.
Tracking Data Migrated and Cost Savings
StorCycle provides information on the amount of data migrated as well as the associated cost savings when migrating from the Primary Tier to the Perpetual Tier. For each storage location, administrators can provide a department and cost per terabyte (TB). StorCycle will then monitor the amount of data moved between storage locations and calculate an associated cost savings. For example, if a storage manager moved 10TB of data from a storage location that costs $1000 per TB to a tier that costs $100 per TB, the total cost savings would be displayed as $9000. If a department is included with the storage location, that department will linked to the cost savings.
Migration and cost savings are reported via the StorCycle dashboard as well as in the Report section. This information can be used for storage “charge back” among various departments/groups or to help build business cases to purchase/continue to purchase secondary storage.