Data Storage and Databases (GCP PDE) Flashcards
GCP Professional Data Engineer Flashcards

| Front | Back |
| Archive Storage ideal use case | Best for long-term storage of data rarely accessed |
| BigQuery clustering | Organizes data using a field to optimize performance for specific query patterns |
| BigQuery data sharing | Uses datasets and authorized views to share data securely across projects |
| BigQuery export formats | Supports exporting data in formats like CSV, JSON, and Avro to Cloud Storage |
| BigQuery federated queries | Enables querying external data sources like Google Sheets or Cloud Storage files |
| BigQuery flat-rate pricing | Provides predictable costs by purchasing a dedicated amount of query processing capacity |
| BigQuery integration with Looker Studio | Enables interactive visualization and reporting for datasets |
| BigQuery optimization | Use partitioned or clustered tables for improved query performance |
| BigQuery partitioning | Helps optimize query costs by organizing data based on a specific column like date |
| BigQuery pricing model | Charges are based on storage and the amount of data processed during queries |
| BigQuery reserved slots | Offers guaranteed computing resources to improve query performance |
| BigQuery scalability | Supports petabytes of data with serverless querying |
| BigQuery slot-based pricing | Determines query performance based on the number of slots purchased |
| BigQuery use case | Best for analyzing large-scale datasets using SQL-like queries |
| Bigtable backup options | Supports creating backups for tables in specific instances for point-in-time recovery |
| Bigtable consistency model | Provides only eventual consistency for data writes and reads |
| Bigtable data model | Wide-column database optimized for sparse data |
| Bigtable indexing | Uses row keys for primary indexing with no built-in secondary indexes |
| Bigtable locality | Stores data physically adjacent based on row keys for faster access |
| Bigtable query limitations | Requires row-key optimization as it does not natively support complex joins or aggregations |
| Bigtable region distribution | Can be deployed across multiple zones for disaster tolerance |
| Bigtable replication | Used for availability and disaster recovery purposes |
| Bigtable row design | Design row keys to optimize data access patterns |
| Bigtable scaling | Automatically adjusts to handle increased throughput or storage without downtime |
| Bigtable use case | Best for low-latency operations on large-scale time-series data |
| Cloud Datastore vs Bigtable | Datastore is better for transactional consistency, while Bigtable is better for analytics and high throughput |
| Cloud Firestore use case | Best for mobile and web applications requiring offline support and real-time synchronization |
| Cloud Firestore vs Datastore | Firestore provides advanced querying and offline support, while Datastore offers simpler APIs |
| Cloud Spanner use case | Best for horizontally scalable relational databases with strong consistency requirements |
| Cloud SQL backups | Supports automated and on-demand backups for disaster recovery |
| Cloud SQL replication types | Supports both asynchronous and synchronous replication for high-availability scenarios |
| Cloud SQL use case | Best for relational databases requiring compatibility with MySQL, PostgreSQL, or SQL Server |
| Cloud Storage data lifecycle management | Policies automatically delete or transition objects between tiers based on age |
| Cloud Storage IAM | Used for granular control over who can access and manage data |
| Cloud Storage scalability | Automatically scales to handle large amounts of unstructured data |
| Cloud Storage signed URLs | Provides temporary access to specific objects using a time-limited URL |
| Cloud Storage tiers | Standard, Nearline, Coldline, and Archive |
| Cloud Storage use case | Best for storing unstructured data like images, videos, and backups |
| Cloud Storage vs Persistent Disk | Cloud Storage is object storage, while Persistent Disk is block storage attached to VMs |
| Coldline Storage ideal use case | Best for data accessed less than once a year |
| Datastore access control | Uses IAM policies to define permissions at project, entity group, or key levels |
| Datastore indexing | Automatically indexes properties for queries but allows custom index configuration |
| Datastore queries | Support strong or eventual consistency depending on query type |
| Datastore relationship modeling | Supports nested entities and ancestor query patterns |
| Datastore transactions | Allow atomicity for multiple operations across multiple entities |
| Datastore use case | Best for scalable NoSQL applications and transactional workloads |
| Nearline Storage ideal use case | Best for data accessed less than once a month but more than once a year |
| Streaming data to BigQuery use case | Best for real-time analytics and dashboards |
Related Study Materials
About the Flashcards
Flashcards for the GCP Professional Data Engineer exam focus on Google Cloud's core data storage and analytic services. Each card pairs product capabilities with ideal use cases, guiding you on when to select Cloud Storage, BigQuery, Bigtable, Datastore, Cloud SQL, Spanner, or Firestore. Architecture choices like object versus block storage and real-time versus batch analytics are reinforced.
Review key concepts such as storage tiers, partitioning, clustering, replication, pricing models, IAM, and data lifecycle policies. The deck also highlights performance optimization, consistency trade-offs, backup strategies, and integration patterns, providing concise reminders of the terminology and decision points most likely to appear on exam scenarios.
Topics covered in this flashcard deck:
- Google Cloud storage tiers
- BigQuery architecture & pricing
- Bigtable design & scalability
- Datastore and Firestore consistency
- Cloud SQL & Spanner relational options
- IAM and data lifecycle