00:20:00

Microsoft Fabric Data Engineer Associate Practice Test (DP-700)

Use the form below to configure your Microsoft Fabric Data Engineer Associate Practice Test (DP-700). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

Logo for Microsoft Fabric Data Engineer Associate DP-700
Questions
Number of questions in the practice test
Free users are limited to 20 questions, upgrade to unlimited
Seconds Per Question
Determines how long you have to finish the practice test
Exam Objectives
Which exam objectives should be included in the practice test

Microsoft Fabric Data Engineer Associate DP-700 Information

The Microsoft Fabric Data Engineer Associate (DP-700) exam shows that you know how to work with data in Microsoft Fabric. It tests your ability to collect, organize, and prepare data so it can be used for reports and dashboards. Passing the DP-700 means you can build and manage data pipelines, use tools like Power BI and Azure Synapse, and make sure data is clean and ready for analysis.

This exam is best for people who already have some experience working with data or databases and want to move into a data engineering role. If you enjoy working with numbers, building reports, or using SQL and Python to manage data, this certification can help you stand out to employers. It’s designed for anyone who wants to show their skills in data handling using Microsoft tools.

Before taking the real exam, it’s smart to use DP-700 practice exams, practice tests, and practice questions to prepare. These tools help you get used to the types of questions you’ll see on test day and show which topics you need to study more. By using practice tests often, you can build confidence, improve your score, and walk into the exam knowing what to expect.

Microsoft Fabric Data Engineer Associate DP-700 Logo
  • Free Microsoft Fabric Data Engineer Associate DP-700 Practice Test

  • 20 Questions
  • Unlimited
  • Implement and manage an analytics solution
    Ingest and transform data
    Monitor and optimize an analytics solution
Question 1 of 20

In a Microsoft Fabric notebook, you are building a PySpark Structured Streaming query that must read real-time clickstream events from Azure Event Hubs and land them in a lakehouse so that Power BI DirectLake reports can query the data immediately. The solution must guarantee exactly-once processing after restarts and minimize small-file creation. Which writeStream configuration should you use?

  • .writeStream.format("delta").outputMode("append").option("checkpointLocation","/chkpt").start("Tables.clicks")

  • .writeStream.format("delta").outputMode("complete").option("checkpointLocation","/chkpt").start("/lakehouse/tables/clicks")

  • .writeStream.format("delta").outputMode("append").option("checkpointLocation","/chkpt").option("triggerOnce","true").start("Tables.clicks")

  • .writeStream.format("parquet").outputMode("append").option("checkpointLocation","/chkpt").start("/lakehouse/files/clicks")

Question 2 of 20

You are a Contributor in a Microsoft Fabric workspace that has item-level security enabled. You need to ensure that only the security group named DataScienceTeam can view and use a specific lakehouse, while other workspace items remain accessible to all workspace Members. Which action should you take to meet this requirement?

  • Open the lakehouse's Manage permissions pane yourself, remove inherited permissions, and assign the Viewer role to the DataScienceTeam group.

  • Apply a Confidential sensitivity label to the lakehouse and rely on Microsoft Purview to prevent unauthorized users from opening it.

  • Change the workspace access mode from Public to Private so that only explicitly added users, including DataScienceTeam, can access any item in the workspace.

  • Request a workspace Owner to break permission inheritance for the lakehouse, remove other users, and assign the Viewer role to the DataScienceTeam group.

Question 3 of 20

You manage a Microsoft Fabric workspace that contains a lakehouse named ContosoSales. Historical sales files are stored in an Azure Data Lake Storage Gen2 (ADLS Gen2) account that is not linked to Fabric. You must expose the folder /raw/sales/ in ADLS Gen2 to users of ContosoSales so that they can query the data from SQL analytics endpoints without copying the files into OneLake. Security administrators insist that:

  • Access to the ADLS Gen2 account continues to be controlled with Azure role-based access control (Azure RBAC).
  • Finance analysts who already have Storage Blob Data Reader rights on the ADLS Gen2 container must automatically be able to query the data from Fabric.

Which action should you perform when you create the shortcut in ContosoSales to meet these requirements?

  • Create the shortcut with anonymous (public) access and rely on Fabric workspace roles for security.

  • Configure the shortcut to use Azure Active Directory passthrough authentication.

  • Copy the files into a new managed table in the lakehouse and grant Fabric item-level permissions instead of creating a shortcut.

  • Configure the shortcut to use a service principal that has Storage Blob Data Reader rights on the container.

Question 4 of 20

You are designing a new fact table named SalesTransactions in a Microsoft Fabric data warehouse. The table will ingest about 500 million rows weekly. Business analysts frequently join SalesTransactions to the Customer dimension table on the CustomerID column and then aggregate sales per customer. To reduce data movement during these joins and achieve the best query performance, which distribution option should you choose when creating the SalesTransactions table?

  • Define the table as ROUND_ROBIN distributed across compute nodes.

  • Define the table as HASH distributed on the CustomerID column.

  • Define the table as HASH distributed on the transaction Date column.

  • Define the table as REPLICATE distributed to every compute node.

Question 5 of 20

You manage a Microsoft Fabric workspace that contains a semantic model named Contoso Sales. The model is configured to refresh every hour. The data-quality team must receive an email whenever a scheduled refresh ends with a Failure status. You need a solution that can be configured directly in the Fabric (Power BI) service without building a custom monitoring pipeline. What should you do?

  • Open the semantic model's Scheduled refresh settings, enable refresh failure notifications, and add the data-quality distribution list as email recipients.

  • Pin the semantic model's refresh history to a dashboard and configure a data alert on the visual so that it emails the data-quality team when the value equals Failure.

  • Create an Azure Monitor metric alert that fires when the semantic model's refresh duration metric exceeds a threshold and add the distribution list to the action group.

  • Configure a Fabric capacity usage alert for the workspace and specify the distribution list as notification recipients.

Question 6 of 20

You are designing a daily process in Microsoft Fabric to retrieve foreign-exchange rates. The process must read a list of currency codes from a table, call an external REST API once for each code, perform complex JSON parsing that relies on a custom Python library, and then write the results as Delta tables in a Lakehouse. Developers want to run and debug the logic interactively before scheduling it in production. Which Fabric item should you choose to implement the core data-transformation logic?

  • Develop a Dataflow Gen2 that retrieves the API data and performs the parsing and write-back operations.

  • Create a Spark notebook and schedule it (or call it from a pipeline) to handle the loop, API calls, parsing, and writes.

  • Use a Data Factory pipeline that executes a stored procedure in a Fabric warehouse to perform the API calls and transformations.

  • Build a Data Factory pipeline that uses a Copy activity combined with a Mapping Data Flow to ingest and transform the API responses.

Question 7 of 20

You created a Microsoft Fabric workspace in live (default) mode to develop a new lakehouse and several Data Factory pipelines. Your team now requires full source control so that each change can be committed to a feature branch and reviewed before it reaches the main branch. You must also keep the existing workspace items and continue development without losing any work. What should you do first?

  • Create a new Fabric workspace in Git mode, export the current artifacts as .pbip files, and import them into the new workspace.

  • Enable change tracking at the lakehouse level and configure the Data Factory pipelines to use published versions only.

  • Configure a deployment pipeline for the current workspace and add a new Development stage linked to the feature branch.

  • Switch the existing workspace from Live mode to Git mode and connect it to the team's Azure DevOps repository and development branch.

Question 8 of 20

You are building a data solution in Microsoft Fabric. Whenever a new CSV file arrives in a specific OneLake folder, you must automatically run a short PySpark notebook that cleans the data and appends it to a Lakehouse table. If the notebook fails, a notification must be posted to a Microsoft Teams channel. You want a low-code approach that provides built-in monitoring and retry capabilities. Which Fabric component should you implement?

  • Define a Delta Live Tables pipeline inside a notebook and schedule it with the Fabric SQL job scheduler, adding Teams notification logic in SQL.

  • Schedule the PySpark notebook to run every few minutes and add Python code that polls the folder and calls the Microsoft Teams API to post a message on errors.

  • Create a Data Factory pipeline that uses an event trigger on the OneLake folder, executes the PySpark notebook, and adds a conditional step to send the Teams notification on failure.

  • Configure a stored procedure in the Lakehouse SQL endpoint and invoke it through a OneLake event grid trigger to process the file and send Teams alerts.

Question 9 of 20

You manage a Microsoft Fabric workspace that receives daily CSV files into a lakehouse bronze folder. The business wants a reusable transformation that non-developer analysts can build in the Power BI service without writing code. The solution must join the files, filter rows, create aggregates, and write the results to a dimension table in the same lakehouse. Which Fabric component should you use to build the transformation?

  • Implement a T-SQL stored procedure in a Fabric warehouse and call it from a pipeline.

  • Create a Dataflow Gen2 that loads its output to the lakehouse table.

  • Write a KQL query in a Real-Time Analytics database and materialize the results.

  • Develop a Spark notebook with PySpark transformations and schedule it in a pipeline.

Question 10 of 20

You are troubleshooting an hourly Dataflow Gen2 that transforms JSON files from a landing-zone folder and loads the results into a curated table. The last two refreshes failed. You need to determine the exact applied step that produced the error and review how many rows were processed immediately before the failure. Which action should you take in Microsoft Fabric?

  • Open the dataflow's Refresh history page and inspect the run-details pane for the failed refresh.

  • Review the pipeline activity run details in the Monitor hub.

  • Query the lakehouse's _sys.diag_execution_log table for the dataflow run to retrieve per-step metrics.

  • Enable query diagnostics in the Power Query editor and review the generated diagnostics tables.

Question 11 of 20

In a Microsoft Fabric notebook, you develop a PySpark Structured Streaming query that reads JSON events from Azure Event Hubs and writes five-minute order totals to a Delta table located in the Lakehouse's bronze zone. The solution must resume automatically with no data loss if the notebook runtime is restarted. When you configure the writeStream operation, which option must you set to meet the requirement?

  • Set option("mergeSchema", "true") on the writeStream sink.

  • Set option("checkpointLocation", "abfss://lakehouse/checkpoints/orders") on the writeStream sink.

  • Set option("ignoreChanges", "true") on the writeStream sink.

  • Set option("maxFilesPerTrigger", "1") on the writeStream sink.

Question 12 of 20

You are a Microsoft Fabric administrator for Contoso. Your organization has enabled data domains and created several domain workspaces. A data engineering team discovers that they cannot assign a non-member security group to the 'Data Curators' role in one of the domain workspaces. You need to explain why this happens and how to allow the team to delegate curation tasks to that security group. Which action should you recommend?

  • Change the workspace from a domain workspace back to a personal workspace.

  • Move the workspace to a different Fabric capacity in the same tenant.

  • Add the security group as a member of the domain workspace and then assign it to the Data Curators role.

  • Enable Git-based version control for the workspace.

Question 13 of 20

You are designing a Microsoft Fabric solution to ingest a nightly 200-GB batch of structured sales data that lands in Azure Data Lake Storage Gen2. Analysts query the data in Power BI and need fully ACID transactions, time-travel auditing, and automatic index and statistics maintenance for complex T-SQL joins. Which Fabric data store should you load the sales data into to best satisfy these requirements?

  • Azure Cosmos DB for NoSQL

  • Fabric Real-Time Intelligence KQL database

  • Fabric Lakehouse

  • Fabric Warehouse

Question 14 of 20

You manage a Microsoft Fabric workspace that contains a certified dataset named Sales Model. An external analyst must be able to connect to the dataset from Excel and Power BI Desktop and publish their own reports to a different workspace. The analyst must not be able to edit the dataset itself or see any other items in your workspace. Which item-level permission should you assign to the analyst on Sales Model?

  • Build (Read, reshare, and build)

  • Admin

  • Read

  • Contribute

Question 15 of 20

You manage a Fabric Eventstream that ingests 100,000 telemetry events per second from IoT devices into an Eventhouse table named Telemetry. Analysts complain that KQL queries filtered to the most recent hour take several seconds to return. Investigation shows that ingestion generates thousands of small extents (<50 MB). Without provisioning additional capacity, which configuration change is most likely to improve both ingestion efficiency and query performance?

  • Increase the target extent size by editing the ingestion batching policy for the Telemetry table.

  • Create a materialized view that summarizes the Telemetry table every five minutes.

  • Enable delta ingestion mode on the Eventstream output to the Telemetry table.

  • Reduce the table data retention period from 30 days to 7 days.

Question 16 of 20

You work in a Microsoft Fabric lakehouse. The Sales table has about 500 million rows, and the ProductSubcategory and ProductCategory tables each have fewer than 1 000 rows. You must build a daily Gold-layer table that denormalizes Sales with subcategory and category attributes while minimizing network shuffle and keeping the join in memory. Which Spark technique should you apply before running the joins?

  • Repartition the Sales DataFrame to a single partition, then perform the joins sequentially.

  • Disable Adaptive Query Execution so that Spark resorts to default shuffle hash joins.

  • Combine the three DataFrames with unionByName() and apply filters afterward.

  • Use the Spark broadcast() function (or BROADCAST join hint) on the two small lookup DataFrames before joining them to Sales.

Question 17 of 20

You are designing a streaming ingestion solution in Microsoft Fabric Real-Time Intelligence. IoT devices send 5,000 events per second through Azure Event Hubs. Analysts must run KQL queries with sub-second latency over the most recent 30 days of data. The events will not be stored in any other system. Which storage option should you configure as the Eventstream destination to meet these requirements?

  • A shortcut that references Delta Lake files in an existing Fabric Lakehouse

  • Mirrored tables that replicate data from Azure SQL Database

  • Native tables (native storage) in Real-Time Intelligence

  • Mirrored tables that replicate data from Azure Cosmos DB

Question 18 of 20

An IoT gateway continuously writes raw CSV files to the /ingest/raw/ folder of a Microsoft Fabric Lakehouse. A PySpark notebook must run immediately after each new file arrives to cleanse the data and append it to a Delta table, without rerunning for earlier files. Which trigger type should you configure on the pipeline that calls the notebook?

  • Configure an event trigger that fires on the creation of a new file in the /ingest/raw/ folder.

  • Configure a tumbling-window trigger with a one-hour interval.

  • Rely on a manual trigger that operators can start after verifying file arrival.

  • Configure a scheduled trigger that runs the pipeline every five minutes.

Question 19 of 20

You need to receive an email whenever a specific pipeline in a Microsoft Fabric Data Factory workspace runs for longer than 15 minutes. You want the solution to rely only on built-in Fabric capabilities and require the least ongoing administration. Which approach should you use?

  • Configure a data alert on a Power BI dashboard tile that displays the pipeline's execution time.

  • Create a threshold alert in the Monitoring hub by using an Activator that triggers when the pipeline run duration exceeds 15 minutes.

  • Add a SQL trigger on the lakehouse execution log table to send an email when a run lasts more than 15 minutes.

  • Create an Azure Monitor metric alert rule that evaluates the PipelineRunDuration metric for the workspace.

Question 20 of 20

You ingested SalesFact, ProductDim (about 500,000 rows), and CategoryDim (30 rows) into a Microsoft Fabric lakehouse. Each product belongs to one category. To avoid extra joins in Power BI, you need to merge ProductDim and CategoryDim into a ProductExtended Delta table by using a PySpark notebook. Which approach minimizes shuffle and stays scalable as data volumes grow?

  • Repartition both tables to a single partition, perform a standard inner join, and then write the output.

  • Create a view that joins ProductDim and CategoryDim at query time and expose the view to Power BI.

  • Use a broadcast join in PySpark to join ProductDim with CategoryDim, then write the result to the ProductExtended Delta table.

  • Run a CROSS JOIN between ProductDim and CategoryDim and apply a filter on matching keys before writing the result.