🔥 40% Off Crucial Exams Memberships — This Week Only

3 days, 14 hours remaining!
00:20:00

Microsoft Fabric Data Engineer Associate Practice Test (DP-700)

Use the form below to configure your Microsoft Fabric Data Engineer Associate Practice Test (DP-700). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

Logo for Microsoft Fabric Data Engineer Associate DP-700
Questions
Number of questions in the practice test
Free users are limited to 20 questions, upgrade to unlimited
Seconds Per Question
Determines how long you have to finish the practice test
Exam Objectives
Which exam objectives should be included in the practice test

Microsoft Fabric Data Engineer Associate DP-700 Information

The Microsoft Fabric Data Engineer Associate (DP-700) exam shows that you know how to work with data in Microsoft Fabric. It tests your ability to collect, organize, and prepare data so it can be used for reports and dashboards. Passing the DP-700 means you can build and manage data pipelines, use tools like Power BI and Azure Synapse, and make sure data is clean and ready for analysis.

This exam is best for people who already have some experience working with data or databases and want to move into a data engineering role. If you enjoy working with numbers, building reports, or using SQL and Python to manage data, this certification can help you stand out to employers. It’s designed for anyone who wants to show their skills in data handling using Microsoft tools.

Before taking the real exam, it’s smart to use DP-700 practice exams, practice tests, and practice questions to prepare. These tools help you get used to the types of questions you’ll see on test day and show which topics you need to study more. By using practice tests often, you can build confidence, improve your score, and walk into the exam knowing what to expect.

Microsoft Fabric Data Engineer Associate DP-700 Logo
  • Free Microsoft Fabric Data Engineer Associate DP-700 Practice Test

  • 20 Questions
  • Unlimited time
  • Implement and manage an analytics solution
    Ingest and transform data
    Monitor and optimize an analytics solution
Question 1 of 20

You are setting up a new workspace in Microsoft Fabric for a team of data scientists. The team often runs ad-hoc Spark notebooks that need more executors than the default configuration, but you want to avoid users selecting an oversized engine each time. In the Workspace settings pane you decide to change only one Spark-related setting so that, by default, all new interactive Spark sessions in the workspace start with 16 cores and 128 GB of memory, while still allowing users to choose a different size at run time if needed. Which workspace-level setting should you modify?

  • Session idle timeout

  • Enable custom libraries

  • Default engine size

  • Maximum concurrent Spark sessions per user

Question 2 of 20

You manage a Microsoft Fabric workspace that contains a Data Factory pipeline named LoadSales that ingests point-of-sale files into a lakehouse. The pipeline must run every hour on the hour. If the Fabric service is unavailable for several hours, any missed executions must run automatically in order when service resumes, without overlapping runs. Which trigger type should you configure for the pipeline?

  • A schedule trigger configured with an hourly recurrence

  • An event trigger that listens for Azure Storage blob-created events

  • A tumbling window trigger with a one-hour window and dependency handling

  • A manual trigger invoked through a webhook on demand

Question 3 of 20

You use a Fabric notebook to load daily CSV files into a Delta Lake table named sales.Order. The source sometimes retransmits old files, creating duplicates, and late-arriving rows for an order line can appear days later. You must keep only the latest record for each (OrderID, LineNumber) pair and allow the notebook to be rerun safely without adding duplicates. Which strategy meets these requirements?

  • Create a staging view that uses ROW_NUMBER() OVER (PARTITION BY OrderID, LineNumber ORDER BY EventDate DESC) to keep the latest row, then MERGE the view into sales.Order on OrderID and LineNumber.

  • Use COPY INTO sales.Order with FILEFORMAT = 'CSV' and enable constraint checks so duplicate keys are rejected during loading.

  • TRUNCATE TABLE sales.Order at the start of every run and INSERT all current files to refresh the table contents.

  • Append all rows with INSERT INTO sales.Order, then run OPTIMIZE ZORDER BY (OrderID, LineNumber) after each load to remove duplicates.

Question 4 of 20

You manage a Microsoft Fabric workspace that contains several semantic models. A model named SalesModel is configured to refresh every eight hours. Business users report that some refreshes have failed during the past month. You must review all refresh attempts from the last 30 days and see, for each attempt, the start time, end time, duration, status, and any available failure details. Which pane should you open from the SalesModel's settings page to obtain this information?

  • The Microsoft 365 audit log for Power BI activities

  • The Power BI (Fabric) Capacity Metrics app

  • The semantic model's Monitoring pane in the workspace

  • The semantic model's Refresh history pane

Question 5 of 20

In a Microsoft Fabric workspace, you want a group of data stewards to monitor the execution status of existing Data Factory pipelines and view or interact with published reports. They must not be able to run pipelines, create or modify reports or notebooks, schedule dataflow refreshes, or change workspace settings or membership. Which workspace role should you assign to these users?

  • Viewer

  • Contributor

  • Member

  • Admin

Question 6 of 20

You manage a Microsoft Fabric workspace that contains a data warehouse named SalesDW. Your DevOps team wants every table, view, and stored procedure in the warehouse to be stored in a Git repository so that schema changes can be reviewed and deployed through an Azure DevOps pipeline to test and production workspaces. What is the most appropriate first step to create a deployable artifact that captures the current warehouse schema?

  • Export SalesDW from the Fabric portal as a Power BI project (.pbip) and push it to the Git repository.

  • Generate an Azure Resource Manager (ARM) template for the warehouse item from the Azure portal and store it in Git.

  • Execute a T-SQL BACKUP DATABASE command in a Fabric notebook and add the backup file to source control.

  • Use Visual Studio Code with the SQL Database Projects extension to import SalesDW and build a .dacpac file.

Question 7 of 20

You manage a Microsoft Fabric lakehouse that stores a Delta table named SalesSilver. Each day, a landing folder receives a CSV file containing only new or changed rows identified by the TransactionID key column. You must design an incremental loading pattern that:

  • Inserts new rows and updates existing rows.
  • Minimizes code complexity and preserves Delta table history for time-travel. Which approach should you implement?
  • Create a dataflow that deletes all rows from SalesSilver and then appends the full dataset from the landing folder.

  • Use a Spark notebook that executes a MERGE INTO SalesSilver AS tgt USING the incoming dataset AS src ON tgt.TransactionID = src.TransactionID, with UPDATE and INSERT clauses.

  • Run a Spark Structured Streaming job in trigger-once mode that writes the incoming data to SalesSilver using outputMode "append".

  • Configure a pipeline Copy activity to load the CSV file into SalesSilver each day in append mode.

Question 8 of 20

You work in a Microsoft Fabric lakehouse. The Sales table has about 500 million rows, and the ProductSubcategory and ProductCategory tables each have fewer than 1 000 rows. You must build a daily Gold-layer table that denormalizes Sales with subcategory and category attributes while minimizing network shuffle and keeping the join in memory. Which Spark technique should you apply before running the joins?

  • Repartition the Sales DataFrame to a single partition, then perform the joins sequentially.

  • Disable Adaptive Query Execution so that Spark resorts to default shuffle hash joins.

  • Use the Spark broadcast() function (or BROADCAST join hint) on the two small lookup DataFrames before joining them to Sales.

  • Combine the three DataFrames with unionByName() and apply filters afterward.

Question 9 of 20

Your organization ingests daily CSV sales files into a Delta table in Microsoft Fabric by using a Dataflow Gen2. Some days, the upstream system resends one or more previous files, causing duplicate rows in the target table. You must modify the solution so that reprocessing the same file does not create duplicates and avoids a full table reload. What should you do?

  • Add a distinct transformation on all columns in the dataflow before the sink.

  • Change the dataflow to load the files into a new table each day and use UNION ALL to combine the tables.

  • Enable the Auto Compaction feature for the Delta table.

  • Replace the dataflow sink with a Fabric notebook that executes a Delta MERGE statement matching on TransactionID.

Question 10 of 20

While testing a report in Microsoft Fabric warehouse, you run this T-SQL: SELECT OrderDate, COUNT(*) AS TotalOrders FROM dbo.SalesOrders WHERE OrderStatus = 'Completed';

It returns error 8120 stating that column 'OrderDate' must appear in the GROUP BY clause. You need the simplest fix so the query returns one row per day with the correct count. Which change should you make?

  • Add a GROUP BY OrderDate clause to the statement.

  • Insert the DISTINCT keyword after SELECT to eliminate duplicates.

  • Replace COUNT(*) with COUNT(OrderDate) to include the column in the aggregate.

  • Rewrite the aggregate as COUNT(*) OVER (PARTITION BY OrderDate).

Question 11 of 20

You are creating a workspace in Microsoft Fabric for a team of analysts who will build multiple Dataflows Gen2 that ingest files from an AWS S3 bucket into OneLake every night. To ensure that each dataflow can automatically create its own staging and destination lakehouse without manual configuration, which workspace Data workflow setting should you enable before the analysts start building the dataflows?

  • Enable "Validate queries before saving dataflows".

  • Enable "Use dataflow staging capacity during refresh".

  • Disable "Allow dataflows to create new items" and instruct authors to select existing lakehouses.

  • Enable "Auto create new lakehouse for dataflow output".

Question 12 of 20

You are designing a lakehouse in Microsoft Fabric. To avoid duplicating data files that already reside in an Azure Data Lake Storage Gen2 account managed by another team, you decide to surface the folder /raw/finance in your lakehouse by using a shortcut. After the shortcut is created, analytics engineers will load the data with Spark, but the source team wants to guarantee that their files cannot be modified from your workspace.

Which statement about Fabric shortcuts satisfies the source team's requirement?

  • Shortcuts that reference Azure Data Lake Storage Gen2 are read-only, so Spark sessions in the lakehouse can read the files but cannot write or delete them.

  • Write access is controlled by the lakehouse item role assignment, not by the shortcut type, so you must remove the Engineer role to prevent changes.

  • Any shortcut becomes writable once the workspace owner is granted Contributor rights on the target storage account.

  • Fabric automatically creates a versioned copy of the target folder; engineers write to the copy while the original files stay untouched.

Question 13 of 20

A stored procedure in a Microsoft Fabric warehouse runs this statement to upsert rows from StgSales into DimCustomer:

MERGE dbo.DimCustomer AS tgt
USING dbo.StgSales AS src
    ON tgt.CustomerID = src.CustomerID
WHEN MATCHED THEN
    UPDATE SET tgt.City = src.City, tgt.Region = src.Region
WHEN NOT MATCHED BY TARGET THEN
    INSERT (CustomerID, City, Region)
    VALUES (src.CustomerID, src.City, src.Region);

Execution fails with the error: "The MERGE statement attempted to UPDATE or DELETE the same row more than once. A target row matched more than one source row."
You must correct the T-SQL so the procedure succeeds while still performing the required updates and inserts.

Which change should you make to the statement?

  • Replace the MERGE with an INSERT statement that uses the ON ERROR clause to ignore conflicts.

  • Rewrite the USING clause to select DISTINCT CustomerID, City, Region from dbo.StgSales before the MERGE is executed.

  • Execute SET IDENTITY_INSERT dbo.DimCustomer ON immediately before running the MERGE.

  • Add the table hint WITH (NOLOCK) to dbo.StgSales in the USING clause.

Question 14 of 20

You are ingesting JSON telemetry from an Eventstream into an eventhouse table named sensor_data. The Ingestion errors pane for the eventhouse shows repeatedly Failed with error "Unknown JSON mapping "sensorMapping"". The table currently uses only the default automatic mapping. What should you do to resolve the ingestion failure and resume data flow?

  • Enable automatic schema update for the eventhouse so that new fields are added during ingestion.

  • Set the Eventstream output format to CSV to disable JSON format detection.

  • Create a JSON ingestion mapping named sensorMapping on the sensor_data table that matches the structure of the incoming messages.

  • Change the Eventstream sink to write the data to a lakehouse Delta table instead of the eventhouse.

Question 15 of 20

You create a new Microsoft Fabric workspace for a team of data engineers who will author notebooks and run Spark jobs. The team wants each interactive notebook session to shut down automatically if the user forgets to end it, so that capacity is released and costs are minimized. Which workspace-level setting should you configure to meet this requirement?

  • Disable the Data Science item type in the workspace.

  • Set the Session inactivity timeout under Spark session settings.

  • Configure the OneLake linked workspace setting.

  • Assign the workspace to a different Fabric capacity SKU.

Question 16 of 20

Your company already lands sensor readings in Delta Lake tables inside an existing Fabric Lakehouse. A new project must run ad-hoc KQL queries over the latest readings with sub-second latency, but the compliance team insists that no additional copies of the data are created. In Real-Time Intelligence, which storage option should you choose for the KQL database that will power the queries?

  • Create a shortcut in the KQL database that points to the Lakehouse Delta tables.

  • Export the Delta tables to Azure Blob Storage and re-ingest the files into a new native KQL database.

  • Ingest the data into a native KQL database table by connecting the Lakehouse as a source.

  • Set up mirroring for the Lakehouse so that the Delta tables are replicated into Real-Time Intelligence.

Question 17 of 20

You manage a Microsoft Fabric workspace that contains a lakehouse named SalesLakehouse. A daily pipeline currently performs a full Copy data activity that loads the entire Sales.Orders table from an on-premises SQL Server database into a Delta table in the lakehouse. The Orders table has more than 50 million rows and continues to grow, causing the load to exceed the available refresh window.
You must modify the pipeline so that only rows that were inserted or updated since the previous run are copied, while keeping development effort to a minimum and avoiding custom Spark code.
Which change should you implement?

  • Create a Dataflow Gen2 that loads the Orders table without incremental refresh and drop the current pipeline.

  • Replace the Copy data activity with a Spark notebook that runs a MERGE INTO statement between the source and the Delta table.

  • Modify the pipeline to truncate the Delta table and reload the complete Orders data set at each run.

  • Enable incremental extraction in the existing Copy data activity by turning on change tracking and setting LastModifiedDate as the watermark column.

Question 18 of 20

You manage a Fabric Eventstream that ingests JSON telemetry from Azure IoT Hub and routes the data to an Eventhouse table. After a recent device firmware update, the Eventstream Monitoring dashboard shows a rapid increase in the "Failed to write events" metric for the Eventhouse output, while the "Input events" metric remains steady. Which action should you take first to identify the root cause of the failures?

  • Examine the rejected events in the Eventhouse destination's error store (dead-letter folder).

  • Delete and recreate the Eventhouse output with the "Auto create table" option enabled.

  • Refresh the Eventstream input schema to force automatic column mapping.

  • Stop and restart the Eventstream to clear transient write errors.

Question 19 of 20

You need to design a nightly process that ingests 200 GB of semi-structured JSON sales files from an Azure Storage account into a Microsoft Fabric Lakehouse. The solution must land the files unchanged, instantly expose them to several other Lakehouses without duplication, and then run PySpark code that performs complex joins and writes a cleansed Delta table. Which two Fabric capabilities should you combine to meet these requirements?

  • Use a pipeline Copy activity followed by a dataflow Gen2.

  • Mount the storage account in the Lakehouse and schedule a KQL transformation.

  • Enable mirroring on the storage container and query the mirrored tables with T-SQL.

  • Create a OneLake shortcut to the storage location and run a PySpark notebook.

Question 20 of 20

You are designing a data pipeline in Microsoft Fabric that loads operational data into a Lakehouse-based star schema every hour. Dimension tables must retain type-2 history and use surrogate keys that stay unique across all incremental loads. Which action should you implement to prepare the dimension data before the fact tables are loaded?

  • Write the incoming dimension rows in append mode; let a GENERATED ALWAYS IDENTITY column assign surrogate keys automatically during the insert.

  • Use a Delta Lake MERGE statement that matches on the business key, expires the current row, and inserts a new row that receives a new surrogate key whenever any tracked attribute changes.

  • Overwrite the dimension table on every run by using a KQL dataflow that recreates the table from scratch.

  • Load the source table with COPY INTO and keep the original primary key from the operational system as the dimension key.