Using Your Archived Data
Author: John Kaufling | 3 min read | October 24, 2013
Archives are often thought of as static data repositories, but although the information isn’t used in routine, daily operations, it can still be extremely valuable to the enterprise.
Data stored in archives can be used for various support applications and is critical for tasks such as compliance and security, even diagnostics. It can also be reused, finding new life in answering secondary but potentially important business questions.
“Archives may in fact be the first database in your organization that achieves big data status, in terms of growing to petabytes and storing heterogeneous information from a wide variety of sources. The fact that the archive’s purpose is to persist historical data for as-needed retrieval and analysis means it needs to be optimized for fast query, search, and reporting.”
Some academic and research organizations, for example, are required by the agency funding their work to archive their data for public access. This also permits secondary research using the data to be conducted.
It is worth noting for clarity that archiving data is not the same as data backup. George Crump contends a backup/archive strategy may prove useful, particularly for those organizations not needing to comply with retention regulations.
New data structures may require Big Data sets to be in more than one archive. Each of these individual archives may be tied to a specific platform. Archiving websites, for example, typically requires at least three different archives to be created: one each for the metadata, the file data, and the database data.
Querying is the key to using archived data. It has been called a “killer app” for Big Data.
James Kobielus, writing for InfoWorld, explains:
“Telcos have long done call-detail record analysis on massively scalable archival platforms. Security incident and event monitoring, as well as antifraud applications often demand huge databases that persist and correlate event data pulled from system-level security, identity, and other systems. Many IT log analysis applications — for troubleshooting, diagnostics, and optimization — run on databases that scale from the low terabytes into multipetabyte territory. Comprehensive time-series analysis of customer, inventory, logistics, and other trends must correlate large amounts of archival data with most recent data provided from operational systems.”
Without the ability to readily access and query your data archives, it might as well be in a vault or even erased from tape or disc. What’s your opinion? Let us know, we’d love to hear from you.
Image by Wikipedia.