GCP Professional Data Engineer Practice Question

Your team supports a Looker Studio dashboard that reads from the 2-TB BigQuery table retail.fact_sales, which is partitioned by the sale_timestamp column (type: DATE). In the dashboard's SQL you see the predicate WHERE DATE(sale_timestamp) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE(). In Query Execution Details, the scan stage reads 65 GB, far more than the ~5 GB stored in the last seven partitions, making every chart slow. Without changing the dashboard's filters or buying more slots, which action is most likely to reduce the scanned bytes and speed up the queries?

Enable automatic query caching so repeated dashboard queries are served entirely from cached results instead of scanning the table.
Convert fact_sales from a partitioned table to a table clustered on sale_timestamp, because clustering is more efficient than partitioning for date filters.
Purchase additional BigQuery slots so the query can run with more parallel workers and finish sooner despite the larger scan.
Rewrite the filter to compare the sale_timestamp column directly to DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) and CURRENT_DATE(), eliminating the DATE() wrapper so partition pruning can occur.

Report Issue

Answer Description

Partition pruning in BigQuery only works when the filter on the partitioning column is a simple comparison such as sale_timestamp BETWEEN … or sale_timestamp >= …. Wrapping the partition column in a function like DATE() prevents BigQuery from recognizing that only recent partitions are needed, so it scans the whole table (65 GB). Rewriting the predicate to compare the partition column directly-sale_timestamp BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE()-enables automatic pruning, reducing data scanned to the relevant seven partitions and improving performance. Buying more slots or relying on query cache does not address the excessive scan, and reclustering or switching to clustering alone cannot replace effective partition pruning.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.