A retail media team ingests product descriptions, reviews, and support-chat transcripts into BigQuery. They need to power a retrieval-augmented generation (RAG) service that answers natural-language questions about product issues. Design goals include:
The LLM must receive only the top 10 semantically closest text chunks, each ≤ 2 KB.
Embeddings should use a managed, up-to-date model without exporting data outside Google Cloud.
Monthly ingestion adds 20 million new chunks; query latency must stay below 500 ms.
Which architecture best meets the goals while minimizing operational overhead?
Generate embeddings with a custom TensorFlow container on Vertex AI Pipelines, store them in Firestore, and run similarity queries with a Cloud Run microservice that implements HNSW.
Export the text as JSON to Cloud Storage, use the open-source FAISS library on a GKE Autopilot cluster for indexing and ANN search, and stream the results back into BigQuery before calling the LLM.
Store the text in a Bigtable row for each chunk, use Dataproc Serverless with Spark MLlib to build word2vec embeddings, and push the top 10 matches into a BI Engine cache that the LLM queries.
Use ML.GENERATE_EMBEDDING to write vectors into a BigQuery table clustered on the embedding column, and issue VECTOR_SEARCH queries at read time to retrieve the 10 nearest chunks for the prompt sent to the LLM.
ML.GENERATE_EMBEDDING lets BigQuery call a Vertex AI foundation model such as textembedding-gecko without exporting data. Storing the resulting FLOAT64 arrays in a clustered BigQuery table keeps loading simple; approximate nearest-neighbor search with VECTOR_SEARCH can return the 10 closest chunks well under sub-second latency when the table is clustered on the embedding column. BI Engine is irrelevant because the workload is vector similarity, not SQL aggregations. Exporting to an external vector DB or running custom containers would add unnecessary operations and violate the in-database requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is VECTOR_SEARCH in BigQuery and how does it work?
Open an interactive chat with Bash
What is ML.GENERATE_EMBEDDING, and why is it useful in Google Cloud?
Open an interactive chat with Bash
Why is clustering on the embedding column important for query performance?
Open an interactive chat with Bash
What is ML.GENERATE_EMBEDDING in BigQuery?
Open an interactive chat with Bash
What is VECTOR_SEARCH in BigQuery, and how does it work?
Open an interactive chat with Bash
Why is clustering tables on the embedding column important for query latency?
Open an interactive chat with Bash
GCP Professional Data Engineer
Preparing and using data for analysis
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .