AWS Certified Data Engineer Associate DEA-C01 Practice Question
A data engineer must explore a 200 GB CSV data lake on Amazon S3, remove duplicate rows, and check for malformed records. Company policy prohibits long-running clusters, and the engineer wants to perform the work from an existing Jupyter notebook in Amazon SageMaker Studio with minimal infrastructure to manage. Which approach meets these requirements while keeping costs low?
Launch an AWS Glue interactive session from the SageMaker Studio notebook by switching to the Glue PySpark kernel and process the data with Apache Spark.
Create an Amazon EMR cluster with JupyterHub enabled, attach the notebook to the cluster, and terminate the cluster after processing.
Use the Athena for Apache Spark notebook interface to open a new serverless Spark session and connect the SageMaker Studio notebook to it with a JDBC driver.
Run ad-hoc Amazon Athena SQL queries from the notebook with the Boto3 SDK to identify and delete bad or duplicate rows.
AWS Glue interactive sessions let a SageMaker Studio notebook start a temporary, serverless Spark environment by selecting the Glue PySpark kernel. The session starts in seconds, bills by the second for DPUs that are actually used, and shuts down automatically when idle, satisfying the no-persistent-cluster policy. An EMR cluster or self-managed EC2 instance requires manual provisioning and ongoing management. Standard Athena SQL cannot easily perform row-level data cleansing for malformed records, and Athena for Apache Spark notebooks are only available in the Athena console, not from a SageMaker Studio Jupyter environment.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Glue and how does it support Spark processing?
Open an interactive chat with Bash
Why is the Glue PySpark kernel a better option than EMR for this task?
Open an interactive chat with Bash
Why can’t standard Amazon Athena SQL queries handle row-level data cleansing efficiently?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .