Data Processing in Azure Flashcards
Microsoft Azure Data Fundamentals DP-900 Flashcards

| Front | Back |
| How does Azure Monitor assist in data processing workflows | By providing observability through logs, metrics, and alerts for pipelines and resources. |
| How does Azure Synapse enable real-time analytics | Through integration with Azure Stream Analytics for real-time data processing. |
| How does Azure Synapse handle big data processing | By integrating Spark processing within the same platform as SQL-based analytics. |
| How does Azure Synapse Link support operational analytics | By enabling hybrid transactional and analytical processing on operational databases in real time. |
| How does PolyBase assist in Azure Synapse Analytics | By allowing the direct query of external data sources using T-SQL, without moving the data. |
| What are Synapse Notebooks used for in Azure Synapse | To run and manage code written in Python, Spark SQL, Scala, or .NET for data analytics. |
| What feature of Azure Data Factory supports scheduling and monitoring workflows | Integration with Azure Monitor and time-based triggers. |
| What is a Data Lake | An online repository for large volumes of raw data stored in its native format. |
| What is a Delta Table in Azure Databricks | A storage format that enables efficient updates and ACID transactions on data lakes. |
| What is a Lakehouse in Azure Databricks | A combined approach using data lake storage and data warehouse features for unified analytics. |
| What is a Synapse Dedicated SQL Pool | A provisioned data warehouse service optimized for high-performance analytics. |
| What is a Synapse pipeline | A workflow in Azure Synapse that integrates data processes and analytics tasks. |
| What is an Event Hub used for in Azure data processing | To ingest large volumes of data streams in real time for analytics or storage. |
| What is an Integration Runtime in Azure Data Factory | The compute infrastructure used to provide data movement and transformation. |
| What is Apache Spark used for in Azure Databricks | A distributed data processing engine for big data analytics and machine learning. |
| What is AutoML in Azure Databricks | An automated machine learning feature to build predictive models with minimal coding. |
| What is Azure Data Factory | A data integration service for creating workflows to move and transform data at scale. |
| What is Azure Data Lake Storage Gen2 | A scalable data storage solution optimized for big data analytics with hierarchical namespace support. |
| What is Azure Databricks | A collaborative Apache Spark-based analytics platform for big data. |
| What is Azure Synapse Analytics | A cloud analytics service for big data and data warehousing. |
| What is Delta Lake in Azure Databricks | A storage layer that provides ACID transactions and reliable data lakes. |
| What is Incremental Processing in Azure Data Factory | A method to process only the new or updated data, optimizing pipeline performance. |
| What is Mapping Data Flow Debug in Azure Data Factory | A feature to preview and test your transformations before executing the pipeline. |
| What is Row-Level Security in Synapse Dedicated SQL Pool | A feature to restrict data access based on user roles for enhanced security. |
| What is Serverless SQL Pool in Azure Synapse | A distributed query engine that allows on-demand querying of data in a data lake without prior data preparation. |
| What is SQL On-demand analytics in Azure Synapse | A feature to run T-SQL queries on data in Azure Data Lake with no infrastructure setup. |
| What is the Data Explorer in Azure Synapse | A tool for interactive and fast analytics on log and telemetry data in storage. |
| What is the Data Flow Designer in Azure Data Factory | A design interface for visually building and configuring data transformation logic. |
| What is the difference between Copy Activity and Data Flow in Azure Data Factory | Copy Activity moves data between sources, while Data Flow transforms data at scale. |
| What is the difference between Linked Services and Datasets in Azure Data Factory | Linked Services define connectivity to external resources, while Datasets represent the data structure within those connections. |
| What is the purpose of a Data Flow in Azure Data Factory | To perform data transformations without writing code. |
| What is the purpose of a Workspace in Azure Databricks | To organize and collaborate on notebooks, jobs, and libraries. |
| What is the purpose of Integration Pipelines in Azure Synapse | To orchestrate and automate data movement and processing workflows across the platform. |
| What is the purpose of Job Clusters in Azure Databricks | Temporary clusters created for running automated tasks and terminated after execution to save costs. |
| What is the purpose of Triggered Pipelines in Azure Data Factory | To automate data workflows that run based on specific events or schedules. |
| What is the role of Cosmos DB in Azure data processing | A globally distributed, multi-model database for modern app development. |
| What is the Workspace Repository in Azure Databricks used for | To manage notebooks, jobs, and libraries in a centralized repository for collaboration. |
| What tool in Azure Synapse enables Power BI integration | The built-in workspace for seamless visualization and reporting. |
| What type of cluster is used in Azure Databricks for machine learning | GPU-enabled clusters for faster processing of ML tasks. |
| Which programming languages are supported in Azure Databricks notebooks | Python, Scala, R, SQL, and Java. |
About the Flashcards
Flashcards for the Microsoft Azure Data Fundamentals exam guide you through Microsoft Azure's modern data stack, from Synapse Analytics warehouses to Databricks Spark-powered notebooks. Each card reinforces key definitions, service purposes, and architectural roles, helping you connect ingestion, transformation, storage, and analytics components tested on the exam.
Review pipeline orchestration in Azure Data Factory, real-time streaming with Event Hubs, and structured security features such as row-level permissions in dedicated SQL pools. The deck also drills into Delta Lake reliability, AutoML capabilities, and workspace collaboration, while clarifying language support across Python, Scala, R, SQL, and Java. Concise explanations and comparisons make last-minute revision efficient.
Topics covered in this flashcard deck:
- Azure Synapse Analytics
- Azure Databricks & Spark
- Azure Data Factory pipelines
- Delta Lake & Data Lakes
- Real-time ingestion & monitoring
- Security & governance