Data Pipelines and ETL Processes Flashcards

AWS Certified Data Engineer Associate DEA-C01 Flashcards

What is AWS Data Pipeline

A service to schedule and automate data movement and transformation

Jobs handle specific ETL tasks while workflows orchestrate multiple jobs and crawlers

Dynamically scales, reduces coding effort, integrated tools for seamless data preparation

Key benefits of using AWS Glue

A tool to automatically infer schema and metadata of data stored in various sources

Difference between on-demand and scheduled triggers in Glue

Glue for complex ETL; Data Pipeline for simpler scheduled data copy/movement

On-demand triggers run manually, while scheduled triggers operate at set intervals

What is an AWS Glue Crawler

Difference between Glue Jobs and Glue Workflows

Optimal use case for AWS Glue vs Data Pipeline

Front	Back
Advantages of AWS Step Functions for ETL	Enables orchestration and monitoring of workflows using state machines
Difference between Glue Jobs and Glue Workflows	Jobs handle specific ETL tasks while workflows orchestrate multiple jobs and crawlers
Difference between on-demand and scheduled triggers in Glue	On-demand triggers run manually, while scheduled triggers operate at set intervals
How Data Pipeline handles retry logic	Automatically retries failed activities based on defined conditions
How Glue integrates with S3	Allows reading datasets stored in S3 for transformation and loading
How Step Functions differ from Glue Workflow	Step Functions offer more flexibility for orchestrating complex workflows across AWS services
Key benefits of using AWS Glue	Dynamically scales, reduces coding effort, integrated tools for seamless data preparation
Optimal use case for AWS Glue vs Data Pipeline	Glue for complex ETL; Data Pipeline for simpler scheduled data copy/movement
Primary benefit of using Glue with Redshift	Simplifies loading and querying large-scale datasets into Redshift
Purpose of AWS Glue Data Catalog	A centralized metadata repository for data assets that integrates with other AWS services
What is an AWS Glue Crawler	A tool to automatically infer schema and metadata of data stored in various sources
What is AWS Data Pipeline	A service to schedule and automate data movement and transformation
What is AWS Glue	A managed ETL service used to prepare and transform data for analytics
What is ETL	Extract Transform Load - a process to extract data, transform it into a usable format, and load it into a target system
What is partitioning in AWS Glue	Dividing data into subsets based on a key to optimize querying and storage

Front

Key benefits of using AWS Glue

Click the card to flip

1/15

AWS Certified Data Engineer Associate DEA-C01

This deck focuses on AWS services like Glue, Data Pipeline, and Step Functions for building, managing, and optimizing data workflows and ETL processes.

Share on...

Data Pipelines and ETL Processes Flashcards

AWS Certified Data Engineer Associate DEA-C01 Flashcards

You win! 🎉