AWS Certified Data Engineer Associate DEA-C01 Practice Question
A data engineering team launches a transient Amazon EMR cluster each night through an AWS Step Functions workflow. Before any Spark job runs, the cluster must have a proprietary JDBC driver installed on every node. After installation, a PySpark ETL script stored in Amazon S3 must be executed. What is the most operationally efficient way to meet these requirements using native EMR scripting capabilities?
Configure a bootstrap action that downloads and installs the driver on all nodes, then add an EMR step that runs spark-submit on the PySpark script in Amazon S3.
Schedule an EMR Notebook that first installs the driver with pip commands and then executes the PySpark code, triggered nightly by a cron expression.
Build a custom AMI with the driver pre-installed and specify the PySpark ETL through classification properties when creating the cluster.
Pass a shell script to a Hadoop Streaming step that both installs the driver and calls the PySpark script in a single command.
Bootstrap actions are executed on every node as the cluster is provisioning, making them ideal for installing additional software such as a JDBC driver before any jobs start. After the cluster is ready, an EMR step can invoke spark-submit to run a PySpark script that resides in Amazon S3. This combination uses built-in EMR scripting features, requires no custom AMI maintenance, and fits well into an automated Step Functions orchestration. Notebooks do not install software on all nodes automatically and are harder to schedule. Custom AMIs achieve the goal but add ongoing image-management overhead. Using Hadoop Streaming for software installation and Spark execution is possible but not intended for this scenario and complicates the workflow.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a bootstrap action in Amazon EMR?
Open an interactive chat with Bash
How does EMR integrate with AWS Step Functions?
Open an interactive chat with Bash
Why is using an EMR step with `spark-submit` operationally efficient?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .