AWS Certified Data Engineer Associate DEA-C01 Practice Question

A data engineering team launches a transient Amazon EMR cluster each night through an AWS Step Functions workflow. Before any Spark job runs, the cluster must have a proprietary JDBC driver installed on every node. After installation, a PySpark ETL script stored in Amazon S3 must be executed. What is the most operationally efficient way to meet these requirements using native EMR scripting capabilities?

  • Configure a bootstrap action that downloads and installs the driver on all nodes, then add an EMR step that runs spark-submit on the PySpark script in Amazon S3.

  • Schedule an EMR Notebook that first installs the driver with pip commands and then executes the PySpark code, triggered nightly by a cron expression.

  • Build a custom AMI with the driver pre-installed and specify the PySpark ETL through classification properties when creating the cluster.

  • Pass a shell script to a Hadoop Streaming step that both installs the driver and calls the PySpark script in a single command.

AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot