Your retail company operates a 600 TB on-premises Hadoop cluster that stores historical sales logs. The corporate data center connects to Google Cloud over a 200 Mbps dedicated link that also carries other production traffic. After an initial one-time backfill of 200 GB, the cluster produces about 1 TB of new log data each day. All data must become queryable in BigQuery, the existing 200 Mbps link must not be saturated, and ongoing operational effort should be minimal. Which approach should you recommend?
Configure BigQuery Data Transfer Service to connect to the on-prem Hadoop cluster, performing an initial full import and scheduling daily incremental transfers.
Use gsutil rsync over the existing 200 Mbps link to copy the 600 TB and the 1 TB daily increments into a Cloud Storage location that BigQuery reads as an external table.
Provision a 10 Gbps Dedicated Interconnect and run a continuous Dataflow pipeline that streams both historical and daily data directly from the Hadoop cluster into BigQuery.
Ship a Transfer Appliance to move the 600 TB of historical data into Cloud Storage, then schedule daily Storage Transfer Service jobs with on-prem agents to copy new HDFS files to the same bucket and load them into BigQuery.
Copying 600 TB over a 200 Mbps line would take many months and monopolize the WAN, so an offline bulk transfer is preferable. Ordering a Transfer Appliance lets the team load the historical data locally and ship the device to Google, where the data is imported into Cloud Storage without consuming network bandwidth. After the backfill, Storage Transfer Service agents on-prem can run scheduled, bandwidth-throttled jobs that move only the new 1 TB of daily files to the same Cloud Storage bucket. BigQuery can ingest the data through scheduled LOAD jobs or external tables. Provisioning a 10 Gbps Dedicated Interconnect adds cost and operational overhead; gsutil rsync over the 200 Mbps link would still overload the network; and BigQuery Data Transfer Service cannot connect directly to on-prem Hadoop. Combining Transfer Appliance for the initial bulk move with Storage Transfer Service for ongoing increments best meets the requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Transfer Appliance in Google Cloud?
Open an interactive chat with Bash
How does the Storage Transfer Service work with on-prem agents?
Open an interactive chat with Bash
What is the role of BigQuery in the solution?
Open an interactive chat with Bash
What is a Transfer Appliance?
Open an interactive chat with Bash
What is Storage Transfer Service and how do agents work?
Open an interactive chat with Bash
Why doesn’t BigQuery Data Transfer Service work with on-prem Hadoop?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .