🔥 40% Off Crucial Exams Memberships — This Week Only

3 days, 13 hours remaining!
00:20:00

CompTIA Data+ Practice Test (DA0-002)

Use the form below to configure your CompTIA Data+ Practice Test (DA0-002). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

Logo for CompTIA Data+ DA0-002 (V2)
Questions
Number of questions in the practice test
Free users are limited to 20 questions, upgrade to unlimited
Seconds Per Question
Determines how long you have to finish the practice test
Exam Objectives
Which exam objectives should be included in the practice test

CompTIA Data+ DA0-002 (V2) Information

The CompTIA Data+ exam is a test for people who want to show they understand how to work with data. Passing this exam proves that someone can collect, organize, and study information to help businesses make smart choices. It also checks if you know how to create reports, use charts, and follow rules to keep data safe and accurate. CompTIA suggests having about 1 to 2 years of experience working with data, databases, or tools like Excel, SQL, or Power BI before taking the test.

The exam has different parts, called domains. These include learning basic data concepts, preparing data, analyzing it, and creating easy-to-read reports and visualizations. Another important part is data governance, which covers keeping data secure, private, and high quality. Each section of the test has its own percentage of questions, with data analysis being the largest part at 24%.

Overall, the CompTIA Data+ exam is a good way to prove your skills if you want a career in data. It shows employers that you know how to handle data from start to finish, including collecting it, checking it for errors, and sharing results in clear ways. If you enjoy working with numbers and information, this certification can be a great step forward in your career.

CompTIA Data+ DA0-002 (V2) Logo
  • Free CompTIA Data+ DA0-002 (V2) Practice Test

  • 20 Questions
  • Unlimited time
  • Data Concepts and Environments
    Data Acquisition and Preparation
    Data Analysis
    Visualization and Reporting
    Data Governance
Question 1 of 20

A data analytics team requires shared access to a set of moderately sized datasets. The data is stored on a central server and accessed by team members through a mapped network drive on their workstations. This system organizes data in a familiar folder-and-file structure, allowing users to navigate through a directory tree to open files like .csv and .xlsx. Which storage type does this scenario describe?

  • Local storage

  • Block storage

  • File storage

  • Object storage

Question 2 of 20

A data analyst is working with a movie dataset where one of the columns, 'genres', contains a comma-separated string of all genres applicable to a single movie (e.g., 'Action,Adventure,Sci-Fi'). The analyst's objective is to calculate the total number of movies for each individual genre. To accomplish this, each genre for a given movie must be represented on its own row.

Which of the following data transformation techniques should the analyst use to restructure the 'genres' column for this analysis?

  • Binning

  • Parsing

  • Exploding

  • Imputation

Question 3 of 20

At 10:00 a.m., the vice-president of marketing asks you to show the conversion-funnel results of yesterday's four-hour flash-sale in a noon executive meeting. No existing KPI dashboards include that promotion, and the standard sales-operations dashboard will not refresh with yesterday's data until its normal overnight schedule. To meet the VP's deadline with the least delay, which dashboard frequency should you choose?

  • Create an ad hoc dashboard that queries yesterday's sale data and share an immediate link with the VP.

  • Build a continuously streaming real-time dashboard that refreshes every few seconds on the intranet.

  • Add the flash-sale metrics to the nightly recurring dashboard and provide the refreshed version tomorrow.

  • Export the weekly KPI dashboard to PDF and email it before the meeting.

Question 4 of 20

A data analyst at an e-commerce company is tasked with creating a customer segment for a targeted direct mail campaign. The business rule for this campaign is that every selected customer record must have a complete mailing address (street, city, state, and postal code). The analyst runs a query to check the customer table and discovers that while the customer_id, city, and state columns are fully populated, 30% of the records have null values in the postal_code column. Which data quality issue is the primary concern for the analyst in this scenario?

  • Incompleteness

  • Redundancy

  • Outliers

  • Duplication

Question 5 of 20

A data analyst at a healthcare organization is preparing a dataset for a university research study on patient outcomes. The dataset contains sensitive Personal Health Information (PHI). To comply with privacy regulations and protect patient identities, while still providing valuable data for statistical analysis, which of the following data protection practices is the MOST appropriate to apply before sharing the dataset?

  • Data masking

  • Role-based access control (RBAC)

  • Anonymization

  • Encryption at rest

Question 6 of 20

You are preparing a set of numeric customer-behavior features for a k-means clustering model. One of the variables, lifetime_value, is highly right-skewed and contains several extreme outliers that would dominate Euclidean distance calculations if left untreated. You want each feature to contribute proportionally to the distance metric without letting those few large values distort the scale. Which preprocessing technique should you apply before running the clustering algorithm?

  • Apply a logarithmic transformation followed by min-max scaling.

  • Apply Z-score standardization so each feature has mean 0 and standard deviation 1.

  • Apply min-max scaling to force every feature into a 0-1 range.

  • Apply a robust scaler that centers on the median and scales by the interquartile range.

Question 7 of 20

A U.S.-based retailer wants to replicate its PostgreSQL production database, which stores personal data about European Union customers, to a cloud analytics cluster located in Singapore. To satisfy the jurisdictional requirement portion of data-compliance planning, which action should the data team perform first?

  • Confirm that transferring EU personal data to Singapore is permitted and implement an approved cross-border transfer mechanism (for example, Standard Contractual Clauses).

  • Validate that the destination cluster enforces column-level encryption for all sensitive fields.

  • Update the data dictionary to reflect schema changes introduced in the analytics environment.

  • Ensure the replication job meets the required recovery-point and recovery-time objectives.

Question 8 of 20

A data analyst is tasked with processing thousands of unstructured customer reviews from a company's website. The goal is to quickly identify common topics, summarize feedback for different product lines, and understand overall sentiment. Manually reading all the reviews is not feasible due to the volume of data. Which of the following AI concepts is the most appropriate for generating human-like text summaries from this unstructured data?

  • Large language model (LLM)

  • Robotic process automation (RPA)

  • A dimensional table

  • Foundational model

Question 9 of 20

A data analyst is preparing a 250 000-row customer data set to train a supervised churn-prediction model. The target column, Churn_Flag, contains Yes/No values for 248 700 customers, while the remaining 1 300 rows have NULL in that column only; every feature in those 1 300 rows is otherwise complete and within expected ranges. Exploratory checks show that dropping 1 300 records will not materially change the class balance or statistical power of the model. The machine-learning library being used will raise an error if the target variable is missing. Which data-cleansing technique is MOST appropriate for handling the 1 300 affected rows before modeling?

  • Delete the 1 300 rows that have a NULL value in Churn_Flag before training the model.

  • Apply min-max scaling to the numeric features so the algorithm can ignore the NULL labels.

  • Impute each missing Churn_Flag with the most common class so the overall distribution is preserved.

  • Bin Churn_Flag into broader categories and keep the rows to maximize training data size.

Question 10 of 20

A data analyst must ensure that a consolidated revenue report is created, saved to a shared drive, and emailed to executives automatically at 03:00 every night, when no employees are logged on. When configuring the robotic process automation (RPA) workflow, which bot type or deployment model is the most appropriate for this fully automated reporting requirement?

  • Deploy a test-environment bot that executes the workflow only when a QA engineer approves a build.

  • Configure an unattended bot and schedule it in the RPA orchestrator to run at 03:00.

  • Use an attended bot that the analyst launches manually each morning after logging in.

  • Rely on a citizen-developer desktop recorder that operates only while the analyst is active.

Question 11 of 20

A multinational retailer replicates its EU customer database from Frankfurt to several cloud regions worldwide for disaster-recovery analytics. During a GDPR compliance audit, the assessor finds that (1) the data is copied daily to U.S. and APAC regions without any approved transfer mechanism and (2) snapshots of the replicated database have been kept for three years because no retention policy exists. Which set of compliance measures would BEST remediate both findings while still allowing the business to keep a global backup?

  • Require unit and user-acceptance testing for each region and tag every snapshot with metadata to identify its business owner.

  • Tokenise cardholder data, reclassify the database under PCI DSS, and increase snapshot frequency so that no records are lost.

  • Adopt Standard Contractual Clauses (or another Article 46 safeguard) for the international transfers, restrict replication to approved regions, and create a documented retention schedule that deletes or anonymises snapshots once the business purpose expires.

  • Encrypt the database in transit and at rest, mask sensitive columns, and enable automated data-quality profiling to detect drift.

Question 12 of 20

A marketing analyst receives a daily orders file named orders_2025-08-28.json from an internal API. Each JSON record represents a single order and contains a line_items array; every element of line_items is itself an object that holds product_id, quantity and unit_price. The analyst must ingest the data into a relational reporting table that has one row per order. Based on the characteristics of the .json format, which technical challenge is the analyst MOST likely to face during the load?

  • Decompressing the mandatory GZIP compression that JSON applies to all text files.

  • Translating 64-bit integers from big-endian to little-endian format before they can be stored in the database.

  • Flattening hierarchical objects and arrays that do not map cleanly to a two-dimensional row-and-column structure.

  • Converting extended characters because JSON lacks native support for Unicode (UTF-8) encoding.

Question 13 of 20

Block storage is often selected to host high-performance transactional databases in both cloud and on-premises environments. Which of the following characteristics best explains why block storage fits this workload?

  • It stores each dataset as immutable objects along with extensive custom metadata and automatically replicates or erasure-codes those objects across geographic regions to maximize durability.

  • It presents volumes to the operating system as raw disks whose data is divided into fixed-size blocks that can be addressed over protocols like iSCSI or NVMe/TCP, delivering consistently low I/O latency.

  • It exposes data through a hierarchical directory path over shared-file protocols such as NFS or SMB so multiple users can concurrently edit common files.

  • It stripes data across nodes using erasure coding to reduce capacity consumption, accepting higher write latency suited mainly for cold-archive workloads.

Question 14 of 20

A data analyst is developing a star schema to analyze sales performance. The goal is to aggregate key business metrics, such as quantity_sold and sale_amount, for every transaction. This central table must also connect to dimensional tables for Date, Product, and Store. Which type of table should the analyst use for this purpose?

  • Staging table

  • Dimensional table

  • Fact table

  • Bridge table

Question 15 of 20

A retail company has collected a vast dataset of millions of user-submitted images containing its products in various real-world settings. The goal is to develop a system that can automatically identify the specific product and its condition from each image. Which of the following AI concepts is best suited to handle this type of complex pattern recognition in large, unstructured datasets like images?

  • Automated reporting

  • Robotic Process Automation (RPA)

  • Deep learning

  • Natural Language Processing (NLP)

Question 16 of 20

A regional retail chain tracks point-of-sale data that is loaded into its data warehouse every night by 04:00. The sales director wants store managers to open an existing Power BI dashboard at 08:00 each Monday and immediately see a summary of the previous week's results without having to click a refresh button or run a query. Which delivery approach best meets this requirement while minimizing manual effort?

  • Switch the dataset to DirectQuery so the dashboard streams live transactions whenever someone opens it.

  • Provide an ad-hoc report template that managers must run and filter themselves each Monday morning.

  • Export the dashboard as a static PDF every Friday afternoon and email it to all store managers.

  • Configure a scheduled refresh that runs at 05:00 every Monday so the dashboard is updated before managers log in.

Question 17 of 20

During a performance review you discover that a reporting query contains this pattern:

SELECT ...
FROM (
    SELECT CustomerID, SUM(TotalDue) AS TotalSpent
    FROM dbo.Orders
    WHERE OrderDate >= '2024-01-01'
    GROUP BY CustomerID
) AS recent_orders
JOIN dbo.Orders o1 ON o1.CustomerID = recent_orders.CustomerID
JOIN dbo.Orders o2 ON o2.CustomerID = recent_orders.CustomerID;

The execution plan shows the derived subquery against the 50-million-row Orders table is executed three times, causing very high logical reads. Without changing the final results, which action is most likely to reduce execution time and I/O?

  • Add an OPTION (FORCESEEK) hint to every Orders reference to force index seeks during each scan.

  • Insert the subquery results into a local temporary table (#recent_orders), add an index on CustomerID, and join the main query to that temporary table.

  • Rewrite the derived subquery as a common table expression (CTE) so SQL Server can cache the result internally.

  • Add WITH (NOLOCK) hints to all Orders references to avoid locking during the scans.

Question 18 of 20

Your organization is designing a star schema for its e-commerce data warehouse. The model includes a very large Sales fact table that must join to a Product dimension table containing thousands of descriptive attributes (brand, category, size, color, etc.). To follow dimensional-modeling best practices and minimize storage and join costs in the fact table, which primary-key strategy is most appropriate for the Product dimension table?

  • A composite key of ProductSKU combined with EffectiveStartDate and EffectiveEndDate

  • An auto-incrementing integer surrogate key generated within the data warehouse

  • A globally unique identifier (GUID) assigned by the e-commerce application

  • A concatenated natural key made of SupplierID and ManufacturerPartNumber

Question 19 of 20

A data analyst is helping the product team create a survey to measure customer satisfaction with a new feature in their mobile application. The primary goal is to collect quantitative data that can be used to calculate an average satisfaction score and monitor trends over time. Which of the following question types would be the MOST appropriate to include in the survey to meet this specific requirement?

  • A 5-point Likert scale question, such as 'How satisfied are you with the new feature?' with options from 'Very Dissatisfied' to 'Very Satisfied'.

  • A dichotomous question, such as 'Did you find the new feature useful?' with 'Yes' or 'No' answers.

  • A multiple-choice question, such as 'Which of the following best describes your experience with the new feature?' with options like 'Easy to use', 'Buggy', and 'Helpful'.

  • An open-ended question, such as 'What are your thoughts on the new feature?'.

Question 20 of 20

A marketing manager at a subscription-based streaming service asks a data analyst to identify which current customers are most likely to cancel their subscriptions within the next 60 days. The analyst has access to a dataset containing each customer's viewing history, subscription tenure, and past support interactions, along with information on which similar customers have canceled in the past. The manager's goal is to proactively target these at-risk customers with a special retention offer. Which statistical method is MOST appropriate for fulfilling the manager's primary request?

  • Inferential

  • Predictive

  • Prescriptive

  • Descriptive