A data science team at a large financial services company is redesigning its data lifecycle management strategy for petabytes of historical transaction data. This data is rarely accessed after one year but must be retained for seven years to meet regulatory compliance mandates. The team's primary goal is to drastically reduce storage costs. They plan to move data older than one year from standard object storage to a dedicated archival tier. When selecting the most appropriate cloud-based archival storage solution, which of the following represents the most critical trade-off to evaluate?
Data compression efficiency versus CPU overhead during ingestion.
Long-term storage cost versus data retrieval time and cost.
Geographic data replication strategy versus network latency.
Data format standardization versus schema evolution flexibility.
The correct answer identifies the fundamental trade-off of archival storage tiers. These services, such as AWS S3 Glacier Deep Archive or Azure Archive Storage, are designed to provide extremely low storage costs for vast amounts of data. In exchange for this low cost, they impose higher costs for data retrieval and significantly longer retrieval times, often measured in hours. For data that is kept for compliance but rarely accessed, this trade-off is ideal for cost optimization.
The other options are incorrect:
Data compression efficiency versus CPU overhead is a data engineering concern that is typically addressed during the data ingestion or transformation phase, before the data is moved to an archive. While it impacts storage size, it is not the defining trade-off of the archival storage tier itself.
Geographic data replication is a feature related to disaster recovery and high availability. While archival tiers are highly durable, their primary purpose is not immediate availability across regions, which is a greater concern for hot, operational data.
Data format standardization is a data governance decision made to optimize storage and query performance. The choice of format (e.g., Parquet) is separate from the choice of the storage service tier, which is based on access patterns, cost, and retrieval needs.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are cloud-based archival storage solutions?
Open an interactive chat with Bash
Why is data retrieval cost and time more critical in archival storage than other factors?
Open an interactive chat with Bash
How does archival storage differ from standard object storage?