Microsoft Fabric Data Engineer Associate Practice Test (DP-700)
Use the form below to configure your Microsoft Fabric Data Engineer Associate Practice Test (DP-700). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

Microsoft Fabric Data Engineer Associate DP-700 Information
The Microsoft Fabric Data Engineer Associate (DP-700) exam shows that you know how to work with data in Microsoft Fabric. It tests your ability to collect, organize, and prepare data so it can be used for reports and dashboards. Passing the DP-700 means you can build and manage data pipelines, use tools like Power BI and Azure Synapse, and make sure data is clean and ready for analysis.
This exam is best for people who already have some experience working with data or databases and want to move into a data engineering role. If you enjoy working with numbers, building reports, or using SQL and Python to manage data, this certification can help you stand out to employers. It’s designed for anyone who wants to show their skills in data handling using Microsoft tools.
Before taking the real exam, it’s smart to use DP-700 practice exams, practice tests, and practice questions to prepare. These tools help you get used to the types of questions you’ll see on test day and show which topics you need to study more. By using practice tests often, you can build confidence, improve your score, and walk into the exam knowing what to expect.

Free Microsoft Fabric Data Engineer Associate DP-700 Practice Test
- 20 Questions
- Unlimited time
- Implement and manage an analytics solutionIngest and transform dataMonitor and optimize an analytics solution
You are setting up a new workspace in Microsoft Fabric for a team of data scientists. The team often runs ad-hoc Spark notebooks that need more executors than the default configuration, but you want to avoid users selecting an oversized engine each time. In the Workspace settings pane you decide to change only one Spark-related setting so that, by default, all new interactive Spark sessions in the workspace start with 16 cores and 128 GB of memory, while still allowing users to choose a different size at run time if needed. Which workspace-level setting should you modify?
Session idle timeout
Enable custom libraries
Default engine size
Maximum concurrent Spark sessions per user
Answer Description
The default resources that an interactive Spark session receives in a Microsoft Fabric workspace are controlled by the Default engine size setting. Setting this value to Large (16 cores, 128 GB) makes every new notebook or interactive Spark job start on that size unless the user explicitly picks another size when they run the code. Other settings such as Session timeout, Enable custom libraries, or Max parallel sessions influence when sessions shut down, whether extra libraries can be installed, or how many sessions a user can run, but none of them determine the initial number of cores and memory an engine receives.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the purpose of the Default engine size setting in Microsoft Fabric?
How does the Default engine size differ from Session idle timeout?
Can users override the Default engine size for their Spark sessions?
You manage a Microsoft Fabric workspace that contains a Data Factory pipeline named LoadSales that ingests point-of-sale files into a lakehouse. The pipeline must run every hour on the hour. If the Fabric service is unavailable for several hours, any missed executions must run automatically in order when service resumes, without overlapping runs. Which trigger type should you configure for the pipeline?
A schedule trigger configured with an hourly recurrence
An event trigger that listens for Azure Storage blob-created events
A tumbling window trigger with a one-hour window and dependency handling
A manual trigger invoked through a webhook on demand
Answer Description
Tumbling window triggers are time-based and keep state for every discrete, contiguous time slice (window). If a window execution is skipped because the service is unavailable, the trigger automatically backfills the missed windows in chronological order once connectivity is restored, ensuring each window is processed exactly once. Schedule triggers fire only at the specified times and do not automatically replay missed runs. Event triggers respond to external events such as blob creation and are not suitable for strict hourly scheduling. Manual triggers require external calls and provide no automatic recurrence or backfill capability.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Tumbling Window Trigger?
How does Tumbling Window Trigger differ from Schedule Trigger?
What does 'dependency handling' mean in Tumbling Window Triggers?
You use a Fabric notebook to load daily CSV files into a Delta Lake table named sales.Order. The source sometimes retransmits old files, creating duplicates, and late-arriving rows for an order line can appear days later. You must keep only the latest record for each (OrderID, LineNumber) pair and allow the notebook to be rerun safely without adding duplicates. Which strategy meets these requirements?
Create a staging view that uses ROW_NUMBER() OVER (PARTITION BY OrderID, LineNumber ORDER BY EventDate DESC) to keep the latest row, then MERGE the view into sales.Order on OrderID and LineNumber.
Use COPY INTO sales.Order with FILEFORMAT = 'CSV' and enable constraint checks so duplicate keys are rejected during loading.
TRUNCATE TABLE sales.Order at the start of every run and INSERT all current files to refresh the table contents.
Append all rows with INSERT INTO sales.Order, then run OPTIMIZE ZORDER BY (OrderID, LineNumber) after each load to remove duplicates.
Answer Description
Delta Lake offers ACID transactions and the MERGE INTO command, which performs atomic upserts based on key columns. By creating a temporary view that assigns ROW_NUMBER() over (PARTITION BY OrderID, LineNumber ORDER BY EventDate DESC) and filtering for the first row, you keep just the newest version of each order line. Merging this view into sales.Order updates existing rows and inserts new ones, so repeated notebook runs remain idempotent. COPY INTO does not enforce primary-key uniqueness, OPTIMIZE ZORDER BY only reorders data without removing duplicates, and a TRUNCATE followed by INSERT reloads all data and discards late-arriving rows.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the ROW_NUMBER() function, and how does it help in data deduplication?
How does the MERGE INTO command maintain data consistency in Delta Lake?
What are ACID transactions, and why are they important in Delta Lake?
You manage a Microsoft Fabric workspace that contains several semantic models. A model named SalesModel is configured to refresh every eight hours. Business users report that some refreshes have failed during the past month. You must review all refresh attempts from the last 30 days and see, for each attempt, the start time, end time, duration, status, and any available failure details. Which pane should you open from the SalesModel's settings page to obtain this information?
The Microsoft 365 audit log for Power BI activities
The Power BI (Fabric) Capacity Metrics app
The semantic model's Monitoring pane in the workspace
The semantic model's Refresh history pane
Answer Description
Open the Monitoring pane from the semantic model's settings page. The Monitoring pane lists every refresh attempt made in the last 30 days, showing the start and end times, total duration, final status, and any error details captured for failed refreshes. Other tools such as the Refresh history pane, Capacity Metrics app, and Microsoft 365 audit logs provide only limited or aggregate data and do not expose per-refresh duration together with detailed error information.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What data can be found in the Monitoring pane of a semantic model?
How does the Monitoring pane differ from the Refresh history pane?
What other tools can be used to monitor refresh performance in Microsoft Fabric?
In a Microsoft Fabric workspace, you want a group of data stewards to monitor the execution status of existing Data Factory pipelines and view or interact with published reports. They must not be able to run pipelines, create or modify reports or notebooks, schedule dataflow refreshes, or change workspace settings or membership. Which workspace role should you assign to these users?
Viewer
Contributor
Member
Admin
Answer Description
The Member role permits users to view and interact with all items in a workspace and to monitor existing Data Factory pipeline runs. However, Members cannot run pipelines, create or edit items, schedule or trigger data-flow refreshes, or modify workspace settings or access-capabilities reserved for higher roles such as Admin. Viewer lacks pipeline-monitoring rights, and Contributor or Admin would provide more privileges than required. Therefore, Member is the least-privilege role that meets the stated requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between a Member and a Viewer role in Microsoft Fabric?
What permissions are included in the Member role in Microsoft Fabric?
Why is the Member role the best fit for monitoring pipelines in Microsoft Fabric?
You manage a Microsoft Fabric workspace that contains a data warehouse named SalesDW. Your DevOps team wants every table, view, and stored procedure in the warehouse to be stored in a Git repository so that schema changes can be reviewed and deployed through an Azure DevOps pipeline to test and production workspaces. What is the most appropriate first step to create a deployable artifact that captures the current warehouse schema?
Export SalesDW from the Fabric portal as a Power BI project (.pbip) and push it to the Git repository.
Generate an Azure Resource Manager (ARM) template for the warehouse item from the Azure portal and store it in Git.
Execute a T-SQL BACKUP DATABASE command in a Fabric notebook and add the backup file to source control.
Use Visual Studio Code with the SQL Database Projects extension to import SalesDW and build a .dacpac file.
Answer Description
Microsoft Fabric data warehouses are compatible with SQL database projects. By using the SQL Database Projects extension in Visual Studio Code (or Visual Studio) you can import the existing SalesDW schema and automatically generate a .dacpac file. The .dacpac (Data-tier Application Component Package) is a portable artifact that represents the entire database schema and can be committed to a Git repository and deployed to other Fabric workspaces through CI/CD pipelines. Exporting a Power BI project captures reports and models, not warehouse objects; ARM template exports do not include Fabric warehouse schemas; and BACKUP DATABASE produces a backup, not a schema-focused deployment package suitable for source control and incremental deployments.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a .dacpac file, and why is it useful for databases?
How does the SQL Database Projects extension help in managing database schemas?
What are CI/CD pipelines, and how do they work in deploying database changes?
You manage a Microsoft Fabric lakehouse that stores a Delta table named SalesSilver. Each day, a landing folder receives a CSV file containing only new or changed rows identified by the TransactionID key column. You must design an incremental loading pattern that:
- Inserts new rows and updates existing rows.
- Minimizes code complexity and preserves Delta table history for time-travel. Which approach should you implement?
Create a dataflow that deletes all rows from SalesSilver and then appends the full dataset from the landing folder.
Use a Spark notebook that executes a MERGE INTO SalesSilver AS tgt USING the incoming dataset AS src ON tgt.TransactionID = src.TransactionID, with UPDATE and INSERT clauses.
Run a Spark Structured Streaming job in trigger-once mode that writes the incoming data to SalesSilver using outputMode "append".
Configure a pipeline Copy activity to load the CSV file into SalesSilver each day in append mode.
Answer Description
Delta Lake supports the MERGE INTO statement, which performs an atomic upsert by matching source and target rows on a key column. Executing MERGE in a Spark notebook (or job) will insert new rows and update existing ones in a single operation, while automatically recording a new version of the table so that historical snapshots remain available. Loading with append-only modes, blindly deleting and re-appending data, or streaming in append mode would either duplicate data, lose history, or require additional logic to handle updates, making them less suitable for the required incremental pattern.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the MERGE INTO statement in Delta Lake?
What is Delta Lake's time-travel feature?
Why is incremental loading preferred over full data reloads?
You work in a Microsoft Fabric lakehouse. The Sales table has about 500 million rows, and the ProductSubcategory and ProductCategory tables each have fewer than 1 000 rows. You must build a daily Gold-layer table that denormalizes Sales with subcategory and category attributes while minimizing network shuffle and keeping the join in memory. Which Spark technique should you apply before running the joins?
Repartition the Sales DataFrame to a single partition, then perform the joins sequentially.
Disable Adaptive Query Execution so that Spark resorts to default shuffle hash joins.
Use the Spark broadcast() function (or BROADCAST join hint) on the two small lookup DataFrames before joining them to Sales.
Combine the three DataFrames with unionByName() and apply filters afterward.
Answer Description
Broadcasting very small lookup tables is a well-known Spark optimization. When you call broadcast() (or use the BROADCAST join hint) on ProductSubcategory and ProductCategory, the driver ships their data to every executor node. Each executor can then join its partition of the large Sales DataFrame locally, eliminating shuffle of the 500-million-row fact table. Repartitioning Sales to one partition forces single-threaded work, disabling AQE does not reduce shuffle, and unionByName() appends rows rather than joins.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Spark broadcast join?
What is Adaptive Query Execution (AQE) in Spark?
How does network shuffle affect performance in Spark?
Your organization ingests daily CSV sales files into a Delta table in Microsoft Fabric by using a Dataflow Gen2. Some days, the upstream system resends one or more previous files, causing duplicate rows in the target table. You must modify the solution so that reprocessing the same file does not create duplicates and avoids a full table reload. What should you do?
Add a distinct transformation on all columns in the dataflow before the sink.
Change the dataflow to load the files into a new table each day and use UNION ALL to combine the tables.
Enable the Auto Compaction feature for the Delta table.
Replace the dataflow sink with a Fabric notebook that executes a Delta MERGE statement matching on TransactionID.
Answer Description
Executing a Delta MERGE operation allows you to perform idempotent upserts. The MERGE statement compares incoming rows to the existing Delta Lake table on a business key such as TransactionID; matching rows are updated and non-matching rows are inserted. Because the command is executed every load, duplicates introduced by retransmitted files are eliminated without rewriting the whole table. A distinct transformation inside the dataflow would only remove duplicates within each individual batch, not across historical data. Auto-compaction optimises small files but does not remove logical duplicates. Loading into a new table every day and using UNION ALL merely propagates the duplicates rather than removing them.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the purpose of a Delta MERGE statement in Microsoft Fabric?
How does Auto Compaction work in Delta tables?
Why is using UNION ALL not effective for removing duplicates in Dataflow Gen2?
While testing a report in Microsoft Fabric warehouse, you run this T-SQL: SELECT OrderDate, COUNT(*) AS TotalOrders FROM dbo.SalesOrders WHERE OrderStatus = 'Completed';
It returns error 8120 stating that column 'OrderDate' must appear in the GROUP BY clause. You need the simplest fix so the query returns one row per day with the correct count. Which change should you make?
Add a GROUP BY OrderDate clause to the statement.
Insert the DISTINCT keyword after SELECT to eliminate duplicates.
Replace COUNT(*) with COUNT(OrderDate) to include the column in the aggregate.
Rewrite the aggregate as COUNT(*) OVER (PARTITION BY OrderDate).
Answer Description
The error is raised because the query mixes an aggregated column (COUNT()) with a non-aggregated column (OrderDate) without grouping. Adding OrderDate to a GROUP BY clause resolves the violation of the SQL aggregation rules and produces one summary row per distinct date. Merely switching to COUNT(OrderDate) does not remove the need to group; DISTINCT would still leave the aggregate unresolved; using a windowed COUNT() OVER (PARTITION BY OrderDate) calculates counts but returns a row for every order, not the required single row per day. Therefore, adding GROUP BY OrderDate is the correct and minimal correction.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the purpose of the GROUP BY clause in SQL?
Why does the query produce SQL error 8120 if GROUP BY is missing?
How does COUNT(*) differ from COUNT(column_name) in T-SQL?
You are creating a workspace in Microsoft Fabric for a team of analysts who will build multiple Dataflows Gen2 that ingest files from an AWS S3 bucket into OneLake every night. To ensure that each dataflow can automatically create its own staging and destination lakehouse without manual configuration, which workspace Data workflow setting should you enable before the analysts start building the dataflows?
Enable "Validate queries before saving dataflows".
Enable "Use dataflow staging capacity during refresh".
Disable "Allow dataflows to create new items" and instruct authors to select existing lakehouses.
Enable "Auto create new lakehouse for dataflow output".
Answer Description
When the "Auto create new lakehouse for dataflow output" setting is enabled in the Data workflow section of a Fabric workspace, every new Dataflow Gen2 published to that workspace automatically provisions a dedicated lakehouse as its destination and configures the required staging area. This eliminates the need for authors to pre-create or select a lakehouse manually. The other options control how dataflows are refreshed or validated but do not cause automatic lakehouse creation.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Dataflow Gen2 in Microsoft Fabric?
How does the 'Auto create new lakehouse for dataflow output' setting work?
What is the role of OneLake in Microsoft Fabric?
You are designing a lakehouse in Microsoft Fabric. To avoid duplicating data files that already reside in an Azure Data Lake Storage Gen2 account managed by another team, you decide to surface the folder /raw/finance in your lakehouse by using a shortcut. After the shortcut is created, analytics engineers will load the data with Spark, but the source team wants to guarantee that their files cannot be modified from your workspace.
Which statement about Fabric shortcuts satisfies the source team's requirement?
Shortcuts that reference Azure Data Lake Storage Gen2 are read-only, so Spark sessions in the lakehouse can read the files but cannot write or delete them.
Write access is controlled by the lakehouse item role assignment, not by the shortcut type, so you must remove the Engineer role to prevent changes.
Any shortcut becomes writable once the workspace owner is granted Contributor rights on the target storage account.
Fabric automatically creates a versioned copy of the target folder; engineers write to the copy while the original files stay untouched.
Answer Description
Shortcuts that point to external storage such as Azure Data Lake Storage Gen2 are mounted in Microsoft Fabric as read-only paths. Spark users in the destination lakehouse can read the data through the shortcut, but any attempt to create, update, or delete files under the shortcut will fail because Fabric blocks write operations to external locations. Only shortcuts that target another OneLake location are writable. Therefore, choosing an external shortcut automatically guarantees that the finance team's files remain unchanged, whereas the other options either describe writable shortcuts, unsupported behaviors, or unrelated features.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a shortcut in Microsoft Fabric?
Why are shortcuts to external storage read-only?
How does Spark interact with shortcuts in Microsoft Fabric?
A stored procedure in a Microsoft Fabric warehouse runs this statement to upsert rows from StgSales into DimCustomer:
MERGE dbo.DimCustomer AS tgt
USING dbo.StgSales AS src
ON tgt.CustomerID = src.CustomerID
WHEN MATCHED THEN
UPDATE SET tgt.City = src.City, tgt.Region = src.Region
WHEN NOT MATCHED BY TARGET THEN
INSERT (CustomerID, City, Region)
VALUES (src.CustomerID, src.City, src.Region);
Execution fails with the error: "The MERGE statement attempted to UPDATE or DELETE the same row more than once. A target row matched more than one source row."
You must correct the T-SQL so the procedure succeeds while still performing the required updates and inserts.
Which change should you make to the statement?
Replace the MERGE with an INSERT statement that uses the ON ERROR clause to ignore conflicts.
Rewrite the USING clause to select DISTINCT CustomerID, City, Region from dbo.StgSales before the MERGE is executed.
Execute SET IDENTITY_INSERT dbo.DimCustomer ON immediately before running the MERGE.
Add the table hint WITH (NOLOCK) to dbo.StgSales in the USING clause.
Answer Description
The error appears because more than one row in the source can match a single row in the target on CustomerID, so the MERGE would update the same target row multiple times. Selecting only one row per CustomerID in the source eliminates the conflict. Adding DISTINCT (or similar deduplication logic) in the USING clause guarantees each target row can match at most one source row, allowing the MERGE to complete. Using WITH (NOLOCK) can change the set of rows returned and does not guarantee removal of duplicates; SET IDENTITY_INSERT affects identity columns and is unrelated to this error; and INSERT … ON ERROR is not valid T-SQL syntax.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a MERGE statement in T-SQL?
Why does adding DISTINCT resolve the error?
How does WITH (NOLOCK) affect the query in T-SQL?
You are ingesting JSON telemetry from an Eventstream into an eventhouse table named sensor_data. The Ingestion errors pane for the eventhouse shows repeatedly Failed with error "Unknown JSON mapping "sensorMapping"". The table currently uses only the default automatic mapping. What should you do to resolve the ingestion failure and resume data flow?
Enable automatic schema update for the eventhouse so that new fields are added during ingestion.
Set the Eventstream output format to CSV to disable JSON format detection.
Create a JSON ingestion mapping named sensorMapping on the sensor_data table that matches the structure of the incoming messages.
Change the Eventstream sink to write the data to a lakehouse Delta table instead of the eventhouse.
Answer Description
The error message indicates that the ingestion request references a JSON mapping called sensorMapping, but no mapping with that name exists on the sensor_data table. Eventhouses (built on the Azure Data Explorer engine) require an ingestion mapping when the incoming data is JSON and a specific mapping name is supplied. Defining a JSON ingestion mapping with that exact name and column-to-path assignments lets the eventhouse correctly parse each record and the ingestion pipeline succeeds. Switching sinks, enabling automatic schema updates, or changing the output format do not address the missing mapping reference and therefore will not eliminate the error.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is JSON ingestion mapping in an eventhouse?
How is an eventhouse different from a lakehouse Delta table?
Why doesn't automatic schema update resolve the JSON ingestion error?
You create a new Microsoft Fabric workspace for a team of data engineers who will author notebooks and run Spark jobs. The team wants each interactive notebook session to shut down automatically if the user forgets to end it, so that capacity is released and costs are minimized. Which workspace-level setting should you configure to meet this requirement?
Disable the Data Science item type in the workspace.
Set the Session inactivity timeout under Spark session settings.
Configure the OneLake linked workspace setting.
Assign the workspace to a different Fabric capacity SKU.
Answer Description
In Microsoft Fabric, Spark compute used by notebooks is managed through the workspace's Spark session settings. The "Session inactivity timeout" setting defines how long an interactive Spark session can remain idle before Fabric automatically terminates it, releasing the capacity back to the pool. Adjusting this timeout at the workspace level enforces the automatic shutdown behavior for all notebooks created in that workspace. Other settings-such as enabling data workflow items, assigning a default capacity, or linking a OneLake workspace-do not control when idle Spark sessions are stopped.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the 'Session inactivity timeout' in Spark settings?
How does Spark compute capacity work in Microsoft Fabric?
What are some key Spark session settings available in Microsoft Fabric?
Your company already lands sensor readings in Delta Lake tables inside an existing Fabric Lakehouse. A new project must run ad-hoc KQL queries over the latest readings with sub-second latency, but the compliance team insists that no additional copies of the data are created. In Real-Time Intelligence, which storage option should you choose for the KQL database that will power the queries?
Create a shortcut in the KQL database that points to the Lakehouse Delta tables.
Export the Delta tables to Azure Blob Storage and re-ingest the files into a new native KQL database.
Ingest the data into a native KQL database table by connecting the Lakehouse as a source.
Set up mirroring for the Lakehouse so that the Delta tables are replicated into Real-Time Intelligence.
Answer Description
Shortcuts let a KQL database in Real-Time Intelligence reference data that already resides in OneLake or a Fabric Lakehouse without copying the files. Because the data remains in its original Delta format, no regulatory-sensitive duplicates are created yet the database can still deliver low-latency, sub-second query performance. Native ingestion or mirroring would both create additional physical copies, and exporting then re-ingesting the data duplicates it again while adding unnecessary complexity.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Delta Lake table?
How do KQL shortcuts work with Lakehouse Delta tables?
Why is sub-second query latency achievable with KQL shortcuts?
You manage a Microsoft Fabric workspace that contains a lakehouse named SalesLakehouse. A daily pipeline currently performs a full Copy data activity that loads the entire Sales.Orders table from an on-premises SQL Server database into a Delta table in the lakehouse. The Orders table has more than 50 million rows and continues to grow, causing the load to exceed the available refresh window.
You must modify the pipeline so that only rows that were inserted or updated since the previous run are copied, while keeping development effort to a minimum and avoiding custom Spark code.
Which change should you implement?
Create a Dataflow Gen2 that loads the Orders table without incremental refresh and drop the current pipeline.
Replace the Copy data activity with a Spark notebook that runs a MERGE INTO statement between the source and the Delta table.
Modify the pipeline to truncate the Delta table and reload the complete Orders data set at each run.
Enable incremental extraction in the existing Copy data activity by turning on change tracking and setting LastModifiedDate as the watermark column.
Answer Description
The Copy data activity used in Microsoft Fabric pipelines (based on Azure Data Factory) includes an incremental copy feature. By enabling change tracking and selecting a high-watermark column such as LastModifiedDate, the activity automatically filters the source query to retrieve only rows whose watermark value is greater than the one stored from the previous run. This satisfies the requirement to load only new and changed rows while reusing the existing activity with minimal additional development. Switching to a Spark notebook adds unnecessary complexity, appending full data does not reduce load time, and a non-incremental Dataflow Gen2 would still rebuild the entire table.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is change tracking in the Copy data activity?
What is the role of a watermark column in pipeline configuration?
Why is a Spark notebook not suitable in this case?
You manage a Fabric Eventstream that ingests JSON telemetry from Azure IoT Hub and routes the data to an Eventhouse table. After a recent device firmware update, the Eventstream Monitoring dashboard shows a rapid increase in the "Failed to write events" metric for the Eventhouse output, while the "Input events" metric remains steady. Which action should you take first to identify the root cause of the failures?
Examine the rejected events in the Eventhouse destination's error store (dead-letter folder).
Delete and recreate the Eventhouse output with the "Auto create table" option enabled.
Refresh the Eventstream input schema to force automatic column mapping.
Stop and restart the Eventstream to clear transient write errors.
Answer Description
A sudden rise in the "Failed to write events" metric while input volume is unchanged indicates that events reach the Eventstream but are rejected when written to the Eventhouse table. The Eventstream automatically stores such rejected events in the error store (dead-letter folder) of the Eventhouse destination. Inspecting the error store lets you examine sample payloads and error details-such as schema mapping or data-type mismatches-so you can determine why the write operation fails. Refreshing the Eventstream schema, restarting the Eventstream, or recreating the output does not provide the necessary diagnostic information before you understand the rejection reason.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is an Eventhouse table in Microsoft Fabric?
What is a dead-letter folder or error store in an Eventstream?
How does schema mapping affect data ingestion in Eventstreams?
You need to design a nightly process that ingests 200 GB of semi-structured JSON sales files from an Azure Storage account into a Microsoft Fabric Lakehouse. The solution must land the files unchanged, instantly expose them to several other Lakehouses without duplication, and then run PySpark code that performs complex joins and writes a cleansed Delta table. Which two Fabric capabilities should you combine to meet these requirements?
Use a pipeline Copy activity followed by a dataflow Gen2.
Mount the storage account in the Lakehouse and schedule a KQL transformation.
Enable mirroring on the storage container and query the mirrored tables with T-SQL.
Create a OneLake shortcut to the storage location and run a PySpark notebook.
Answer Description
A OneLake shortcut virtualizes the external JSON files so they appear in the Lakehouse immediately and can be referenced by multiple Lakehouses without copying or conversion. A notebook provides a PySpark runtime that can read the shortcut-linked files, perform complex joins and calculations, and write the results as a Delta table. A pipeline Copy activity would duplicate the data, Dataflow Gen2 cannot run arbitrary PySpark, mirroring targets relational sources, and KQL transformations do not support PySpark logic.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a OneLake shortcut in Microsoft Fabric?
How does PySpark work in Microsoft Fabric notebooks for data processing?
What is a Delta table, and why is it useful in Lakehouses?
You are designing a data pipeline in Microsoft Fabric that loads operational data into a Lakehouse-based star schema every hour. Dimension tables must retain type-2 history and use surrogate keys that stay unique across all incremental loads. Which action should you implement to prepare the dimension data before the fact tables are loaded?
Write the incoming dimension rows in append mode; let a GENERATED ALWAYS IDENTITY column assign surrogate keys automatically during the insert.
Use a Delta Lake MERGE statement that matches on the business key, expires the current row, and inserts a new row that receives a new surrogate key whenever any tracked attribute changes.
Overwrite the dimension table on every run by using a KQL dataflow that recreates the table from scratch.
Load the source table with COPY INTO and keep the original primary key from the operational system as the dimension key.
Answer Description
A Delta Lake MERGE statement lets you compare the incoming rows with the existing dimension on the business (natural) key. When a match is found and attribute values have changed, the MERGE can update the current row by setting its end-date and insert a new version that gets a new surrogate key. When no match exists, the MERGE simply inserts a brand-new row with a new surrogate key. This guarantees that each historical version of the dimension member keeps a unique surrogate key and that the latest version is flagged as current-requirements for a type-2 slowly changing dimension.
Appending data without comparison or overwriting the table would either create duplicate keys or lose history. COPY INTO by itself does not handle updates, and relying on identity values generated in the source system would break surrogate-key independence.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Delta Lake MERGE statement and how does it work?
What is a type-2 slowly changing dimension?
What is the difference between a surrogate key and a natural key?
That's It!
Looks like that's it! You can go back and review your answers or click the button below to grade your test.