Data Storage and Databases (GCP PDE) Flashcards

GCP Professional Data Engineer Flashcards

Front	Back
Archive Storage ideal use case	Best for long-term storage of data rarely accessed
BigQuery clustering	Organizes data using a field to optimize performance for specific query patterns
BigQuery data sharing	Uses datasets and authorized views to share data securely across projects
BigQuery export formats	Supports exporting data in formats like CSV, JSON, and Avro to Cloud Storage
BigQuery federated queries	Enables querying external data sources like Google Sheets or Cloud Storage files
BigQuery flat-rate pricing	Provides predictable costs by purchasing a dedicated amount of query processing capacity
BigQuery integration with Looker Studio	Enables interactive visualization and reporting for datasets
BigQuery optimization	Use partitioned or clustered tables for improved query performance
BigQuery partitioning	Helps optimize query costs by organizing data based on a specific column like date
BigQuery pricing model	Charges are based on storage and the amount of data processed during queries
BigQuery reserved slots	Offers guaranteed computing resources to improve query performance
BigQuery scalability	Supports petabytes of data with serverless querying
BigQuery slot-based pricing	Determines query performance based on the number of slots purchased
BigQuery use case	Best for analyzing large-scale datasets using SQL-like queries
Bigtable backup options	Supports creating backups for tables in specific instances for point-in-time recovery
Bigtable consistency model	Provides only eventual consistency for data writes and reads
Bigtable data model	Wide-column database optimized for sparse data
Bigtable indexing	Uses row keys for primary indexing with no built-in secondary indexes
Bigtable locality	Stores data physically adjacent based on row keys for faster access
Bigtable query limitations	Requires row-key optimization as it does not natively support complex joins or aggregations
Bigtable region distribution	Can be deployed across multiple zones for disaster tolerance
Bigtable replication	Used for availability and disaster recovery purposes
Bigtable row design	Design row keys to optimize data access patterns
Bigtable scaling	Automatically adjusts to handle increased throughput or storage without downtime
Bigtable use case	Best for low-latency operations on large-scale time-series data
Cloud Datastore vs Bigtable	Datastore is better for transactional consistency, while Bigtable is better for analytics and high throughput
Cloud Firestore use case	Best for mobile and web applications requiring offline support and real-time synchronization
Cloud Firestore vs Datastore	Firestore provides advanced querying and offline support, while Datastore offers simpler APIs
Cloud Spanner use case	Best for horizontally scalable relational databases with strong consistency requirements
Cloud SQL backups	Supports automated and on-demand backups for disaster recovery
Cloud SQL replication types	Supports both asynchronous and synchronous replication for high-availability scenarios
Cloud SQL use case	Best for relational databases requiring compatibility with MySQL, PostgreSQL, or SQL Server
Cloud Storage data lifecycle management	Policies automatically delete or transition objects between tiers based on age
Cloud Storage IAM	Used for granular control over who can access and manage data
Cloud Storage scalability	Automatically scales to handle large amounts of unstructured data
Cloud Storage signed URLs	Provides temporary access to specific objects using a time-limited URL
Cloud Storage tiers	Standard, Nearline, Coldline, and Archive
Cloud Storage use case	Best for storing unstructured data like images, videos, and backups
Cloud Storage vs Persistent Disk	Cloud Storage is object storage, while Persistent Disk is block storage attached to VMs
Coldline Storage ideal use case	Best for data accessed less than once a year
Datastore access control	Uses IAM policies to define permissions at project, entity group, or key levels
Datastore indexing	Automatically indexes properties for queries but allows custom index configuration
Datastore queries	Support strong or eventual consistency depending on query type
Datastore relationship modeling	Supports nested entities and ancestor query patterns
Datastore transactions	Allow atomicity for multiple operations across multiple entities
Datastore use case	Best for scalable NoSQL applications and transactional workloads
Nearline Storage ideal use case	Best for data accessed less than once a month but more than once a year
Streaming data to BigQuery use case	Best for real-time analytics and dashboards

GCP Professional Data Engineer

This deck focuses on GCP storage solutions, including Cloud Storage, Bigtable, Datastore, BigQuery, and how to choose the right storage option for a given use case.

Share on...