Data Concepts and Environment Fundamentals Flashcards
CompTIA DataX DY0-001 (V1) Flashcards

| Front | Back |
| Batch processing | Executing a series of commands on a large set of data at once |
| Big data | Large and complex datasets that require advanced tools for storage, processing, and analysis |
| Blockchain data storage | Decentralized, secure method to store data across a network |
| Cloud computing | Using remote servers hosted on the internet to store, manage, and process data |
| Columnar database | A database that stores data by columns, optimized for analytical workloads |
| Concurrent processing | Performing multiple tasks or operations simultaneously in a system |
| Data anonymization | Masking or removing identifying information to protect privacy |
| Data audit trails | Records showing the history and transformations of data |
| Data compression | Reducing the size of data to save storage and improve performance |
| Data encodings | Methods for formatting data into a standardized representation such as UTF-8 |
| Data federation | Integrating data from different sources into a virtual unified view |
| Data governance | Policies and practices to ensure data quality, security, and compliance |
| Data integrity | Ensuring data is accurate, consistent, and reliable |
| Data lake | A storage solution that holds raw data in its native format before processing |
| Data lineage | Tracking where data comes from, how it moves, and where it ends up |
| Data modeling | The process of creating a visual representation of a data system |
| Data partitioning | Dividing data into smaller chunks to optimize performance and scalability |
| Data pipeline | A series of processes that move and transform data from source to destination |
| Data processing | The act of converting raw data into meaningful information |
| Data redundancy | Storing the same data in multiple locations leading to inefficiency |
| Data scalability | The ability to handle increasing amounts of data without performance issues |
| Data storage | The method or technology used to save data such as HDD, SSD, or cloud storage |
| Data type | Represents the kind of data such as integer, float, string, or boolean |
| Data visualization | Representing data graphically to better understand trends and patterns |
| Data warehouse | A centralized repository for storing large amounts of structured data for analysis |
| Distributed computing | Using multiple machines to process large volumes of data |
| ETL process | Extract, Transform, Load - steps to move and prepare data for analysis |
| Foreign key | A field in one table that links to the primary key in another table |
| Immutable data | Data that cannot be changed after it is stored |
| Index in database | A structure that improves the speed of data retrieval operations |
| Metadata | Data that provides information about other data such as its format or origin |
| Non-relational database | A database that stores unstructured or semi-structured data like key-value pairs or documents |
| Normalization | Process of organizing data to reduce redundancy and improve consistency |
| NoSQL | Non-relational database systems designed for scalability and flexibility |
| OLAP | Online Analytical Processing designed for complex data analysis and decision-making |
| OLTP | Online Transaction Processing focused on handling routine transaction data |
| Primary key | A unique identifier for a record in a relational database |
| Real-time processing | Processing data as it is generated to provide immediate results |
| Relational database | A type of database using tables with rows and columns to store structured data |
| Replication in databases | Creating duplicates of data for backup and high availability |
| Sharding | Dividing a database into smaller pieces to distribute the load across servers |
| Snapshot in databases | A point-in-time copy of the database for backup or analysis |
| SQL | Structured Query Language used to interact with relational databases |
| Structured data | Data organized in rows and columns often seen in relational databases |
| Transactional data | Data generated from business transactions like orders or payments |
| Unstructured data | Data with no predefined format like images, videos, and text documents |
Related Study Materials
About the Flashcards
Flashcards for the CompTIA DataX exam review the foundations of modern data management. Students can quickly recall key definitions for data types, structured and unstructured data, storage media, and essential database concepts such as tables, primary and foreign keys, indexing, normalization, and SQL versus NoSQL approaches.
Further cards cover data pipelines, ETL, warehousing, and lakes, along with batch, real-time, and distributed processing strategies that underpin big-data analytics. Learners will also revisit governance, integrity, privacy, partitioning, replication, sharding, and scalability techniques like cloud computing and columnar storage, ensuring they can recognize how data flows securely and efficiently through enterprise systems.
Topics covered in this flashcard deck:
- Data types & formats
- Relational vs NoSQL
- Storage & cloud tech
- Batch & real-time processing
- ETL, warehousing, lakes
- Governance and scalability