Bash, the Crucial Exams Chat Bot
AI Bot
Data Pipelines and ETL Processes Flashcards
AWS Certified Data Engineer Associate DEA-C01 Flashcards
| Front | Back |
| Advantages of AWS Step Functions for ETL | Enables orchestration and monitoring of workflows using state machines |
| Difference between Glue Jobs and Glue Workflows | Jobs handle specific ETL tasks while workflows orchestrate multiple jobs and crawlers |
| Difference between on-demand and scheduled triggers in Glue | On-demand triggers run manually, while scheduled triggers operate at set intervals |
| How Data Pipeline handles retry logic | Automatically retries failed activities based on defined conditions |
| How Glue integrates with S3 | Allows reading datasets stored in S3 for transformation and loading |
| How Step Functions differ from Glue Workflow | Step Functions offer more flexibility for orchestrating complex workflows across AWS services |
| Key benefits of using AWS Glue | Dynamically scales, reduces coding effort, integrated tools for seamless data preparation |
| Optimal use case for AWS Glue vs Data Pipeline | Glue for complex ETL; Data Pipeline for simpler scheduled data copy/movement |
| Primary benefit of using Glue with Redshift | Simplifies loading and querying large-scale datasets into Redshift |
| Purpose of AWS Glue Data Catalog | A centralized metadata repository for data assets that integrates with other AWS services |
| What is an AWS Glue Crawler | A tool to automatically infer schema and metadata of data stored in various sources |
| What is AWS Data Pipeline | A service to schedule and automate data movement and transformation |
| What is AWS Glue | A managed ETL service used to prepare and transform data for analytics |
| What is ETL | Extract Transform Load - a process to extract data, transform it into a usable format, and load it into a target system |
| What is partitioning in AWS Glue | Dividing data into subsets based on a key to optimize querying and storage |
This deck focuses on AWS services like Glue, Data Pipeline, and Step Functions for building, managing, and optimizing data workflows and ETL processes.