CompTIA DataX DY0-001 (V1) Practice Question

A data engineering team is building a daily batch processing pipeline that ingests terabytes of semi-structured JSON web server logs. The processed data must be stored to serve two primary use cases:

  1. Enable fast, columnar-based analytical queries for business intelligence dashboards.
  2. Provide a cost-effective, long-term archive for retraining machine learning models.

Given these requirements, which of the following persistence strategies for the processed data offers the best balance of query performance, cost efficiency, and schema evolution support?

  • Convert the data to Parquet format and persist it in a cloud-based object storage service.

  • Persist the raw JSON data directly in a distributed file system without any format conversion.

  • Load the flattened JSON data into a transactional, row-oriented relational database.

  • Store the final processed data in a distributed in-memory cache.

CompTIA DataX DY0-001 (V1)
Operations and Processes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot