Webinar Q&A: How to Build the Perfect Archive

Welcome to Spectra’s webinar Q&A roundup. In this Q&A blog series we will pick relevant questions from our recent webinars and publish the responses here. 

Spectra recently hosted a virtual presentation on building the perfect archive. During the webinar, we discussed how organizations can build a best-in-class archive to address data storage cost and security concerns, as well as how to configure individual storage components to meet unique data objectives.

A live poll of the webinar audience showed that over 60% of attendees don’t have a well-articulated archive strategy in place, despite seeing their data grow by more than 25% annually. While these numbers may be enough to urge most data-driven companies into action, the bigger picture suggests an even dimmer outlook. Industry statistics reveal that the average enterprise organization saw data increase 42% over the last two years. All the while, 79% of IT teams struggle to migrate data to a low-cost storage tier and 75% of companies currently have no formal data retention strategy.


There is a growing need for effective ways to store and manage digital information. Given the many use cases for a digital archive, efficient software with flexible archive options is a key component of a modern archive solution. The following questions and answers recap some of the highlights covered in the webinar, with particular focus on the software capabilities of a Spectra Digital Archive solution, powered by StorCycle enterprise software for digital preservation.


Question: What does the user see once data has been migrated? How does a symbolic link or an HTML link differ from the classic stub file?

Answer: Stub files, symbolic links and HTML links are all “breadcrumbs” left behind when a file is moved. This is how the application or user gets the file back as transparently as possible. However, the level of transparency differs based on methodology and is a key differentiator. Stub files are traditionally used by hierarchical storage management applications, which try to map nearline storage to appear to be online. Given the sometimes long latency of nearline storage, this is less than ideal for some environments where applications will time out if they don’t get the data requested in the time they expect to get it.

StorCycle uses both symbolic links and/or HTML links in place of the original file. Symbolic links work great when moving data to NAS disk, as a read can simply be redirected to where the file has been moved. They are ideal for environments where there is constant machine access to data. The symbolic link looks identical to the original file. The only visible difference is a small blue arrow which appears on Windows to indicate that it’s a link. As for HTML links, they are designed specifically to support storage mediums with longer latency, such as tape or low-response cloud levels. When an HTML link is left in place of the migrated file, users will see the same file structure, with the same full file name with “.html” appended at the end of it. Clicking on it presents a simple window which states that the file has been archived, gives information about when, where and how this was done, and allows the user to start a restore without having to contact IT. This self-service aspect is a big differentiator, enabling users to recall individual files or entire project data sets.

Question: Does StorCycle have the ability to move data off primary storage entirely, without leaving anything behind?

Answer: There are multiple ways to set up data movement workflows with StorCycle – the software is designed for both digital preservation and primary storage offload. Users looking to preserve their data can make a single copy of data on a new storage target or make multiple copies of data in multiple locations for additional data protection, all while retaining the original data on primary storage or source storage. When offloading primary storage, users can leave behind symbolic or HTML links. Let’s say you move a data set that consumes 500GB of capacity. When a symbolic  or HTML link is left behind, it will take up less than 1KB on the original storage target. When that link is opened or clicked on, even by a machine or application, it’s actually just referencing the data on a lower tier of storage. Finally, users can also move data entirely off the source storage without leaving anything behind. In this case, the original data is deleted from primary storage after data is validated to be intact and correct on secondary storage.

Question: What kind of monitoring or reporting capabilities does your software enable?

Answer: There are audit trails throughout the system that give visibility into data about individual jobs – such as who accessed a file and when, where and how it was accessed. Users can look at information about the storage targets under management, such as health of tape media, overall speeds and frequency of access, health of tape drives, and more.

Question: Can BlackPearl NAS be used as a primary storage target?  Can I apply back-end policy-based data movement to BlackPearl NAS?

Answer: BlackPearl NAS is based on un underlying file system called ZFS, which is designed for bulk storage. It tends to perform better at scale-up applications for secondary storage. However, a lot of Spectra customers are using it as the main storage in small to medium-size office environments, particularly in workflows that are primarily bulk storage like research or even major networks in media and entertainment. It can also be used as a second, less costly tier of primary storage.

IT administrators can leverage StorCycle’s automatic rule capabilities to set up back-end policy-based data movement from BlackPearl NAS to secondary storage targets. With BlackPearl NAS at the storage source, StorCycle can watch the files in that system through recurring, automated scans. The software can then move data in the background based on pre-set policies. For example, it can move a set of files that have not been accessed in over 20 months to a lower cost tier of storage. Scan and data migration operations can be set to run at a desirable time, such as outside of business hours or designated peak hours.

Question: How does the StorCycle licensing work? Subscription or Perpetual?

Answer: StorCycle can be purchased as a subscription or as a perpetual software entitlement. Unlike other solutions that charge based on the amount of data scanned, regardless of whether the data is moved or not, StorCycle can also be licensed based on the amount of data under management. There are different licensing tiers, each covering up to a certain capacity for a flat fee. As customers need more capacity, they can simply buy more. The tiers are designed to give a lower per-terabyte cost as capacity increases.  

To view a recording of the full webinar, “SpectraLIVE: Building the Perfect Archive”, click here.