GCP Professional Data Engineer Practice Question

You are designing a BigQuery warehouse for an online-learning platform. Each test submission arrives as a JSON object that contains metadata about the submission (submission_id, student_id, timestamp, total_score) and an array of 40-60 questionResponse objects (question_id, is_correct, score). Analysts frequently need daily reports showing the average test score per student and occasionally need to drill into individual question responses for troubleshooting. You must minimize storage scanned and avoid joins during typical queries. Which table design best meets these requirements?

Create one table with a row per submission and an ARRAY<STRUCT<question_id INT64, is_correct BOOL, score FLOAT64>> column to store all question responses for that submission.
Create a wide, flattened table with one row per question response. Duplicate submission metadata columns across every row.
Create two tables: a submission fact table and a questionResponses dimension table keyed by submission_id, and join them at query time.
Store the raw JSON files in Cloud Storage and query them as an external table to avoid schema design changes.

Report Issue

Answer Description

Storing one table with a single row per submission and a repeated STRUCT that holds all questionResponse records keeps parent-level attributes and their children in the same physical row. BigQuery stores repeated and nested fields in a columnar layout that lets queries read only the parent columns when calculating the average total_score, scanning far less data than a flattened table. When analysts need individual questions, they can UNNEST the ARRAY. This approach eliminates the submission-question join that a normalized design would require and avoids column duplication that a fully flattened table would create. Querying raw JSON in Cloud Storage would avoid joins but would force BigQuery to read the entire object on every query and sacrifice performance optimizations such as partitioning and clustering.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.