You are asked to build an end-to-end retrieval-augmented generation (RAG) solution that stays entirely inside BigQuery. Product-support articles are already ingested into a table customer_docs(doc_id STRING, text STRING). You must (1) turn every text row into a vector, (2) store the vectors for fast similarity search, and (3) retrieve the most relevant passages at runtime when a user submits a question. Which implementation satisfies all three requirements with minimal custom infrastructure?
Train an AutoML text classification model with CREATE MODEL, use ML.PREDICT to label documents, and return articles whose label matches the intent predicted for the user's question.
Apply ML.FEATURE_CROSS and ML.NORMALIZER to transform the text column, then build a materialized view and join on cosine distance calculated in SQL whenever a user asks a question.
Create a new table SELECT doc_id, ML.GENERATE_EMBEDDING(MODEL textembedding-gecko, text) AS embedding FROM customer_docs; store the ARRAY column. At query time embed the user prompt with the same function and call VECTOR_SEARCH over the table to return the top K matching rows.
Export customer_docs to Cloud Storage as JSON, invoke a Vertex AI batch prediction job to create embeddings, load the output back as an external BigLake table, and query it with standard equality filters.
ML.GENERATE_EMBEDDING can call a Vertex AI text-embedding model from inside BigQuery and return a fixed-length ARRAY. Persisting those arrays in a BigQuery table lets the data serve as an in-database vector store. At serving time you embed the incoming question with the same ML.GENERATE_EMBEDDING call and use VECTOR_SEARCH to perform an approximate-nearest-neighbor lookup against the stored vectors. The other choices either rely on feature engineering functions that do not create semantic embeddings, use ML.PREDICT or ML.EVALUATE for tasks unrelated to vector search, or move vectors outside BigQuery, defeating the goal of an in-database solution.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is ML.GENERATE_EMBEDDING and how does it work in BigQuery?
Open an interactive chat with Bash
What is VECTOR_SEARCH and how is it used for approximate-nearest-neighbor retrieval in BigQuery?
Open an interactive chat with Bash
Why use embeddings and VECTOR_SEARCH in BigQuery instead of external tools or infrastructure?
Open an interactive chat with Bash
ELI5: What are embeddings in machine learning?
Open an interactive chat with Bash
What is VECTOR_SEARCH in BigQuery?
Open an interactive chat with Bash
How does ML.GENERATE_EMBEDDING work in this RAG solution?
Open an interactive chat with Bash
GCP Professional Data Engineer
Preparing and using data for analysis
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .