A financial services company wants to create a central repository to store massive volumes of diverse data for future analysis. The repository must be able to ingest real-time market data, structured transactional records, semi-structured JSON files from web APIs, and unstructured text from news feeds. The data science team requires the flexibility to explore this raw data with various tools and apply different analytical models without being restricted by a predefined structure. Which data repository would be the MOST appropriate for this scenario?
A data lake is the correct choice because it is a centralized repository designed to store vast quantities of data in its native, raw format. This approach supports structured, semi-structured, and unstructured data types. Data lakes use a 'schema-on-read' strategy, meaning the structure is applied when the data is read for analysis, not when it is stored. This provides the flexibility required by the data science team to explore raw data and experiment with different analytical models.
A relational database is incorrect because it requires a predefined schema (schema-on-write) and is best suited for structured data, not the mix of structured, semi-structured, and unstructured data described in the scenario.
A data warehouse is also incorrect. While it can store large amounts of data for analysis, it requires data to be cleaned, transformed, and structured before being loaded (schema-on-write). This would not allow the data science team to work with the raw, unaltered data.
A data mart is a subset of a data warehouse focused on a specific business line or department. It is also based on structured, pre-processed data and lacks the scale and flexibility needed to store raw data from so many different sources for undefined future analysis.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is schema-on-read, and how is it different from schema-on-write?
Open an interactive chat with Bash
Why can't a relational database handle unstructured or semi-structured data efficiently?
Open an interactive chat with Bash
How does a data lake differ from a data warehouse beyond schema requirements?
Open an interactive chat with Bash
CompTIA Data+ DA0-002 (V2)
Data Concepts and Environments
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .