⚡️ Pass with Confidence Sale - 40% off ALL packages! ⚡️

2 days, 6 hours remaining!
00:20:00

CompTIA Data+ Practice Test (DA0-002)

Use the form below to configure your CompTIA Data+ Practice Test (DA0-002). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

Logo for CompTIA Data+ DA0-002 (V2)
Questions
Number of questions in the practice test
Free users are limited to 20 questions, upgrade to unlimited
Seconds Per Question
Determines how long you have to finish the practice test
Exam Objectives
Which exam objectives should be included in the practice test

CompTIA Data+ DA0-002 (V2) Information

The CompTIA Data+ exam is a test for people who want to show they understand how to work with data. Passing this exam proves that someone can collect, organize, and study information to help businesses make smart choices. It also checks if you know how to create reports, use charts, and follow rules to keep data safe and accurate. CompTIA suggests having about 1 to 2 years of experience working with data, databases, or tools like Excel, SQL, or Power BI before taking the test.

The exam has different parts, called domains. These include learning basic data concepts, preparing data, analyzing it, and creating easy-to-read reports and visualizations. Another important part is data governance, which covers keeping data secure, private, and high quality. Each section of the test has its own percentage of questions, with data analysis being the largest part at 24%.

Overall, the CompTIA Data+ exam is a good way to prove your skills if you want a career in data. It shows employers that you know how to handle data from start to finish, including collecting it, checking it for errors, and sharing results in clear ways. If you enjoy working with numbers and information, this certification can be a great step forward in your career.

Free CompTIA Data+ DA0-002 (V2) Practice Test

Press start when you are ready, or press Change to modify any settings for the practice test.

  • Questions: 20
  • Time: Unlimited
  • Included Topics:
    Data Concepts and Environments
    Data Acquisition and Preparation
    Data Analysis
    Visualization and Reporting
    Data Governance

Free Preview

This test is a free preview, no account required.
Subscribe to unlock all content, keep track of your scores, and access AI features!

Question 1 of 20

Your retail organization needs to share a daily customer-transaction table with an external machine-learning vendor. The vendor must be able to join transactions that belong to the same shopper across multiple days, but must never receive personally identifiable information such as names, emails or card numbers. Which data-sharing technique best meets both the privacy and analytical requirements?

  • Encrypt the entire dataset with AES-256 and email the decryption key to the vendor.

  • Replace each email address with a deterministic one-way hash before sending the file.

  • Remove all identifying columns and aggregate the data to store totals per store per day.

  • Mask sensitive fields by replacing the middle characters of names and card numbers with asterisks.

Question 2 of 20

Your organization is designing a star schema for its e-commerce data warehouse. The model includes a very large Sales fact table that must join to a Product dimension table containing thousands of descriptive attributes (brand, category, size, color, etc.). To follow dimensional-modeling best practices and minimize storage and join costs in the fact table, which primary-key strategy is most appropriate for the Product dimension table?

  • An auto-incrementing integer surrogate key generated within the data warehouse

  • A concatenated natural key made of SupplierID and ManufacturerPartNumber

  • A composite key of ProductSKU combined with EffectiveStartDate and EffectiveEndDate

  • A globally unique identifier (GUID) assigned by the e-commerce application

Question 3 of 20

A data analyst is responsible for exporting a dataset containing sensitive customer financial information from an on-premises production database to a cloud-based analytics platform. According to the company's data governance policy, all data must be protected from interception and unauthorized viewing while it is being transferred over the network. Which of the following data protection practices is the most critical for meeting this specific requirement?

  • Encryption in transit

  • Role-based access control

  • Encryption at rest

  • Data masking

Question 4 of 20

Your organization stores shipping records in a SQL Server table named shipments with the columns order_id (INT), order_date (DATETIME), and delivery_date (DATETIME). To monitor logistics performance, you must create a field that shows how many calendar days each order spent in transit so the data team can later filter for any orders that took more than 7 days. You are not allowed to overwrite existing columns. Which SQL statement meets the requirement by creating an appropriate derived variable?

  • SELECT order_id, DATEADD(day, order_date, delivery_date) AS shipping_days FROM shipments;

  • SELECT order_id, DATEDIFF(day, order_date, delivery_date) AS shipping_days FROM shipments;

  • SELECT order_id, delivery_date - order_date AS shipping_days FROM shipments;

  • SELECT order_id, DATEDIFF(day, delivery_date, order_date) AS shipping_days FROM shipments;

Question 5 of 20

A data analyst is working with a data extract from a legacy system saved as product_list.txt. Which statement accurately describes a primary characteristic of a .txt file that the analyst must account for when preparing to load it into a database?

  • The file organizes data into a nested, hierarchical structure of objects and arrays.

  • The file is a binary format that requires a specific database driver or proprietary software to be read correctly.

  • The file format itself does not store metadata or formatting, so the analyst must infer the data's structure, such as the delimiter and character encoding.

  • The file inherently contains schema information, including data types and column headers, in an unformatted header block.

Question 6 of 20

A data analyst is designing a table to store user login events for a new global web application. The application will have users in multiple time zones, and it is critical that all login times are recorded in a standardized way to allow for accurate chronological analysis, regardless of the user's location. Which data type is the MOST appropriate for the column that will store the login event time?

  • Datetime

  • Integer

  • Timestamp

  • String

Question 7 of 20

A data analyst is creating a Microsoft SQL Server table to store customer feedback submitted in English, Spanish, Japanese, and Russian. The instance is running SQL Server 2017 with its default (non-UTF-8) collation. Which data type should be assigned to the FeedbackText column so that all characters are saved without corruption or loss of information?

  • CLOB

  • NVARCHAR

  • CHAR

  • VARCHAR

Question 8 of 20

A data analyst is preparing a 250 000-row customer data set to train a supervised churn-prediction model. The target column, Churn_Flag, contains Yes/No values for 248 700 customers, while the remaining 1 300 rows have NULL in that column only; every feature in those 1 300 rows is otherwise complete and within expected ranges. Exploratory checks show that dropping 1 300 records will not materially change the class balance or statistical power of the model. The machine-learning library being used will raise an error if the target variable is missing. Which data-cleansing technique is MOST appropriate for handling the 1 300 affected rows before modeling?

  • Impute each missing Churn_Flag with the most common class so the overall distribution is preserved.

  • Bin Churn_Flag into broader categories and keep the rows to maximize training data size.

  • Delete the 1 300 rows that have a NULL value in Churn_Flag before training the model.

  • Apply min-max scaling to the numeric features so the algorithm can ignore the NULL labels.

Question 9 of 20

A data analyst needs to combine two large fact tables named invoice_2024 and invoice_2025, each holding the same 15 columns and more than three million rows. Because invoice numbers are sequentially generated and never reused, the business has confirmed there will be no duplicate records across the two tables. The merged result will feed year-over-year revenue dashboards, so the SQL written for the merge step must minimize extra processing such as sorting or de-duplication. Which SQL construct should the analyst use to build the consolidated data set with the least overhead?

  • Use a UNION operator so that only distinct rows are returned.

  • Create an INNER JOIN between the tables on invoice_number.

  • Merge the tables with a FULL OUTER JOIN on invoice_number.

  • Use a UNION ALL operator to append invoice_2025 to invoice_2024.

Question 10 of 20

During a quarterly audit, a data warehouse administrator discovers that an intern who moved from the marketing team to the risk team still belongs to both the Marketing_Analyst and Risk_Analyst database roles, giving her access to tables she no longer needs. The administrator wants to adjust the organization's role-based access control (RBAC) policy so that users keep only the permissions required for their current duties and nothing more. Which RBAC principle should the administrator emphasize to prevent this kind of permission creep?

  • Attribute-based access control

  • Least privilege

  • Mandatory access control

  • Separation of duties

Question 11 of 20

A data-analytics team is building a new pipeline that will run on a 100-node Apache Spark cluster. The team wants to (1) write code in the same language that Spark itself is implemented in, (2) gain immediate access to Spark's native DataFrame and Dataset APIs when a new Spark version is released, and (3) avoid the extra Py4J (or similar) serialization layer that adds cross-language overhead. According to the CompTIA Data+ list of common programming languages, which language should the team choose?

  • Scala

  • Java

  • Python

  • R

Question 12 of 20

A U.S.-based retailer wants to replicate its PostgreSQL production database, which stores personal data about European Union customers, to a cloud analytics cluster located in Singapore. To satisfy the jurisdictional requirement portion of data-compliance planning, which action should the data team perform first?

  • Validate that the destination cluster enforces column-level encryption for all sensitive fields.

  • Ensure the replication job meets the required recovery-point and recovery-time objectives.

  • Confirm that transferring EU personal data to Singapore is permitted and implement an approved cross-border transfer mechanism (for example, Standard Contractual Clauses).

  • Update the data dictionary to reflect schema changes introduced in the analytics environment.

Question 13 of 20

A data analyst is preparing a dataset for a customer segmentation project that will use a distance-based clustering algorithm. The dataset includes the features annual_income, with values ranging from 30,000 to 180,000, and customer_satisfaction_score, with values on a scale of 1 to 5. The analyst is concerned that the annual_income feature will disproportionately influence the clustering results due to its much larger numeric range. Which data transformation technique should be used to prevent this issue and ensure all features contribute more equitably to the analysis?

  • Parsing

  • Binning

  • Imputation

  • Scaling

Question 14 of 20

A hospital analytics team must share a large set of patient treatment records with an external university researcher. Under GDPR Recital 26, the data will fall outside the scope of data-protection law only when the data subjects are not identifiable by any means that are reasonably likely to be used. Which data-preparation strategy BEST achieves true anonymization before release?

  • Remove names and mask the last four digits of each Social Security number, but keep full postcode and exact date of birth for analysis.

  • Encrypt the entire data set with AES-256 and give the researcher only the ciphertext while the hospital retains the decryption key.

  • Generalize quasi-identifiers (for example, turn exact birth dates into five-year age bands) and publish only aggregated records in which each row represents at least 50 patients, with no key linking back to individuals.

  • Replace every patient identifier with a random string and store the mapping table on a secure internal server.

Question 15 of 20

A data analyst at an e-commerce company is tasked with analyzing customer transaction data stored in a database. The dataset contains the customer's full name, shipping address, and the complete, unencrypted 16-digit Primary Account Number (PAN). To adhere to the Payment Card Industry Data Security Standard (PCI DSS), which of the following is the most critical measure to apply to the PAN data at rest?

  • Ensure the data is sent from the database to the analyst's tool over an encrypted channel.

  • Anonymize the customer name and shipping address fields in the table.

  • Render the PAN unreadable using an approved method like truncation, tokenization, or encryption.

  • Implement role-based access control to limit who can query the transaction table.

Question 16 of 20

A data analyst is defining a column to store customer email addresses in a SQL Server table. Values can be anywhere from 5 to 320 characters long, contain only ASCII characters, and should be limited to 320 characters. To minimize storage while supporting this range, which column definition is the most appropriate?

  • varchar(MAX)

  • char(320)

  • varchar(320)

  • nvarchar(320)

Question 17 of 20

During routine monitoring, a data analyst observes a surge of suspicious SQL queries from an external address targeting the company's production database that stores customer PII. The organization's incident-response procedure is based on NIST SP 800-61/800-171 guidance. Which action should the analyst take first to satisfy security‐incident reporting requirements and preserve evidence?

  • Notify the designated incident-response contact immediately with a timestamp, affected system details, and preliminary indicators.

  • Wait for the forensics team to confirm data exfiltration, then report the event in the next change-control meeting.

  • Shut down the database server and delete temporary files that may contain traces of the attack.

  • Apply emergency patches to the database server to close the suspected vulnerability before telling anyone.

Question 18 of 20

A company stores its operational data in a MySQL 8.0 server. You have been asked to reverse-engineer the existing schema into an entity-relationship (EER) diagram, make minor table changes with a visual editor, and run ad-hoc SQL queries in the same workspace. Which tool from the CompTIA Data+ list is specifically designed for this end-to-end database modeling and administration task?

  • Power BI Desktop

  • pandas (Python library)

  • RStudio

  • MySQL Workbench

Question 19 of 20

A reporting analyst is designing a relational table to archive detailed customer feedback notes. Each note can contain up to 200 KB (about 200,000 characters) of plain Unicode text, and analysts will need to run SQL string functions-such as LIKE pattern searches and SUBSTRING extraction-directly against the stored content. Which data type should the analyst assign to the column that stores the feedback notes so the requirement is satisfied without imposing an unnecessary size limit?

  • CLOB (Character large object)

  • VARCHAR(255)

  • BLOB (Binary large object)

  • FLOAT

Question 20 of 20

A data analyst is working with a data file named telemetry.dat which was exported from a proprietary industrial control system. When the analyst attempts to open the file with a standard text editor, it displays a mix of readable text headers and large sections of unreadable, non-printable characters. What is the most likely characteristic of this file?

  • It is a universally standard Character Large Object (CLOB) file that requires a specific database driver to open.

  • It uses a proprietary or custom structure defined by the industrial control system's software.

  • It is a plain text file, and the unreadable characters indicate that the file was corrupted during export.

  • It is a compressed file archive, similar to a .zip file, containing multiple log files.