CompTIA DataX DY0-001 (V1) Practice Question

A data-engineering team stores click-stream events as Parquet files in HDFS. The files are currently written with gzip compression, and CPU profiling shows that nearly 40% of interactive Spark-SQL query time is spent in the decompression step. Storage cost is acceptable; the primary goal is to lower query latency by switching to a different compressed format that is natively supported by Hadoop. Which codec should the team adopt to minimize decompression overhead while still providing a reasonable compression ratio?

  • Replace gzip with bzip2 compression (.bz2)

  • Switch to LZ4 compression

  • Switch to Snappy compression

  • Keep using the default gzip (Deflate) codec

CompTIA DataX DY0-001 (V1)
Operations and Processes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot