GCP Professional Data Engineer Practice Question

A retail media team ingests product descriptions, reviews, and support-chat transcripts into BigQuery. They need to power a retrieval-augmented generation (RAG) service that answers natural-language questions about product issues. Design goals include:

The LLM must receive only the top 10 semantically closest text chunks, each ≤ 2 KB.
Embeddings should use a managed, up-to-date model without exporting data outside Google Cloud.
Monthly ingestion adds 20 million new chunks; query latency must stay below 500 ms.

Which architecture best meets the goals while minimizing operational overhead?

Generate embeddings with a custom TensorFlow container on Vertex AI Pipelines, store them in Firestore, and run similarity queries with a Cloud Run microservice that implements HNSW.
Export the text as JSON to Cloud Storage, use the open-source FAISS library on a GKE Autopilot cluster for indexing and ANN search, and stream the results back into BigQuery before calling the LLM.
Store the text in a Bigtable row for each chunk, use Dataproc Serverless with Spark MLlib to build word2vec embeddings, and push the top 10 matches into a BI Engine cache that the LLM queries.
Use ML.GENERATE_EMBEDDING to write vectors into a BigQuery table clustered on the embedding column, and issue VECTOR_SEARCH queries at read time to retrieve the 10 nearest chunks for the prompt sent to the LLM.

GCP Professional Data Engineer

Preparing and using data for analysis

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is VECTOR_SEARCH in BigQuery and how does it work?

What is ML.GENERATE_EMBEDDING, and why is it useful in Google Cloud?

Why is clustering on the embedding column important for query performance?

What is ML.GENERATE_EMBEDDING in BigQuery?

What is VECTOR_SEARCH in BigQuery, and how does it work?

Why is clustering tables on the embedding column important for query latency?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is VECTOR_SEARCH in BigQuery and how does it work?

What is ML.GENERATE_EMBEDDING, and why is it useful in Google Cloud?

Why is clustering on the embedding column important for query performance?

What is ML.GENERATE_EMBEDDING in BigQuery?

What is VECTOR_SEARCH in BigQuery, and how does it work?

Why is clustering tables on the embedding column important for query latency?

Report Issue