Your organization ingests several terabytes of user-generated videos and high-resolution images every day. The data science team needs to retain these binaries in their original form so they can be re-processed by Spark clusters for computer-vision feature extraction. Future growth is estimated in the petabyte range, and the business wants a pay-as-you-go model that allows data access by simple HTTPS calls without enforcing a fixed relational schema or frequent in-place updates. Which storage architecture BEST satisfies these requirements?
Embed the raw media files as byte-array properties inside a native graph database
Load the binaries into columnar Parquet tables managed by a Hadoop data warehouse
Deploy a distributed object store that exposes an S3-compatible (HTTP/HTTPS) API
Save each video and image in BLOB columns of a high-performance relational database cluster
Object (blob) storage is purpose-built for large volumes of unstructured data such as video, audio, and images. It exposes a flat namespace that can scale almost without limit, offers low-cost, pay-per-capacity pricing, and allows applications (including Spark) to retrieve or overwrite entire objects via REST/HTTPS. Columnar Parquet tables on HDFS are optimized for structured or semi-structured analytic records, not for serving multi-gigabyte video blobs. Relational databases can store BLOB columns, but the approach quickly becomes expensive, complicates backup/restore, and degrades query performance at scale. Graph databases discourage embedding large binary properties; best practice is to store only a URI and keep the actual media in a separate blob store. Therefore, distributed object storage is the most appropriate choice.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is S3-compatible object storage?
Open an interactive chat with Bash
How does object storage differ from relational databases?
Open an interactive chat with Bash
Why is embedding large files as byte arrays in databases inefficient?