GCP Professional Data Engineer Practice Question

Your company stores 600 TB of click-stream logs in an Amazon S3 bucket and ingests about 3 TB of new objects each day. You must replicate the data to a customer-managed-encrypted Cloud Storage bucket in us-central1 within 1 hour of object creation, automatically retry after network interruptions, and avoid re-copying objects that have not changed. A 10 Gbps private link connects AWS and Google Cloud. Which approach should you take?

  • Ship 200 TB Transfer Appliance devices to Google each week and ingest the data into the CMEK-protected Cloud Storage bucket on arrival.

  • Provision a high-memory Compute Engine VM, mount the S3 bucket with s3fs, and schedule a cron job that runs gsutil -m rsync to copy files into the CMEK-protected bucket every 30 minutes.

  • Set up an S3 event notification to Amazon SNS that invokes a Cloud Function for every new object; have the function stream each object into the CMEK-protected Cloud Storage bucket using signed URLs.

  • Configure two recurring Storage Transfer Service jobs from the S3 bucket to the CMEK-protected Cloud Storage bucket, each scheduled hourly but offset by 30 minutes. Enable the overwrite-if-newer option, grant the Storage Transfer Service service agent permission to use the Cloud KMS key, and provide AWS credentials for source access.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot