Bash, the Crucial Exams Chat Bot
AI Bot
Data Storage and Databases (GCP PDE) Flashcards
GCP Professional Data Engineer Flashcards
| Front | Back |
| Archive Storage ideal use case | Best for long-term storage of data rarely accessed |
| BigQuery clustering | Organizes data using a field to optimize performance for specific query patterns |
| BigQuery data sharing | Uses datasets and authorized views to share data securely across projects |
| BigQuery export formats | Supports exporting data in formats like CSV, JSON, and Avro to Cloud Storage |
| BigQuery federated queries | Enables querying external data sources like Google Sheets or Cloud Storage files |
| BigQuery flat-rate pricing | Provides predictable costs by purchasing a dedicated amount of query processing capacity |
| BigQuery integration with Looker Studio | Enables interactive visualization and reporting for datasets |
| BigQuery optimization | Use partitioned or clustered tables for improved query performance |
| BigQuery partitioning | Helps optimize query costs by organizing data based on a specific column like date |
| BigQuery pricing model | Charges are based on storage and the amount of data processed during queries |
| BigQuery reserved slots | Offers guaranteed computing resources to improve query performance |
| BigQuery scalability | Supports petabytes of data with serverless querying |
| BigQuery slot-based pricing | Determines query performance based on the number of slots purchased |
| BigQuery use case | Best for analyzing large-scale datasets using SQL-like queries |
| Bigtable backup options | Supports creating backups for tables in specific instances for point-in-time recovery |
| Bigtable consistency model | Provides only eventual consistency for data writes and reads |
| Bigtable data model | Wide-column database optimized for sparse data |
| Bigtable indexing | Uses row keys for primary indexing with no built-in secondary indexes |
| Bigtable locality | Stores data physically adjacent based on row keys for faster access |
| Bigtable query limitations | Requires row-key optimization as it does not natively support complex joins or aggregations |
| Bigtable region distribution | Can be deployed across multiple zones for disaster tolerance |
| Bigtable replication | Used for availability and disaster recovery purposes |
| Bigtable row design | Design row keys to optimize data access patterns |
| Bigtable scaling | Automatically adjusts to handle increased throughput or storage without downtime |
| Bigtable use case | Best for low-latency operations on large-scale time-series data |
| Cloud Datastore vs Bigtable | Datastore is better for transactional consistency, while Bigtable is better for analytics and high throughput |
| Cloud Firestore use case | Best for mobile and web applications requiring offline support and real-time synchronization |
| Cloud Firestore vs Datastore | Firestore provides advanced querying and offline support, while Datastore offers simpler APIs |
| Cloud Spanner use case | Best for horizontally scalable relational databases with strong consistency requirements |
| Cloud SQL backups | Supports automated and on-demand backups for disaster recovery |
| Cloud SQL replication types | Supports both asynchronous and synchronous replication for high-availability scenarios |
| Cloud SQL use case | Best for relational databases requiring compatibility with MySQL, PostgreSQL, or SQL Server |
| Cloud Storage data lifecycle management | Policies automatically delete or transition objects between tiers based on age |
| Cloud Storage IAM | Used for granular control over who can access and manage data |
| Cloud Storage scalability | Automatically scales to handle large amounts of unstructured data |
| Cloud Storage signed URLs | Provides temporary access to specific objects using a time-limited URL |
| Cloud Storage tiers | Standard, Nearline, Coldline, and Archive |
| Cloud Storage use case | Best for storing unstructured data like images, videos, and backups |
| Cloud Storage vs Persistent Disk | Cloud Storage is object storage, while Persistent Disk is block storage attached to VMs |
| Coldline Storage ideal use case | Best for data accessed less than once a year |
| Datastore access control | Uses IAM policies to define permissions at project, entity group, or key levels |
| Datastore indexing | Automatically indexes properties for queries but allows custom index configuration |
| Datastore queries | Support strong or eventual consistency depending on query type |
| Datastore relationship modeling | Supports nested entities and ancestor query patterns |
| Datastore transactions | Allow atomicity for multiple operations across multiple entities |
| Datastore use case | Best for scalable NoSQL applications and transactional workloads |
| Nearline Storage ideal use case | Best for data accessed less than once a month but more than once a year |
| Streaming data to BigQuery use case | Best for real-time analytics and dashboards |
This deck focuses on GCP storage solutions, including Cloud Storage, Bigtable, Datastore, BigQuery, and how to choose the right storage option for a given use case.