00:20:00

CompTIA Data+ Practice Test (DA0-002)

Use the form below to configure your CompTIA Data+ Practice Test (DA0-002). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

Logo for CompTIA Data+ DA0-002 (V2)
Questions
Number of questions in the practice test
Free users are limited to 20 questions, upgrade to unlimited
Seconds Per Question
Determines how long you have to finish the practice test
Exam Objectives
Which exam objectives should be included in the practice test

CompTIA Data+ DA0-002 (V2) Information

The CompTIA Data+ exam is a test for people who want to show they understand how to work with data. Passing this exam proves that someone can collect, organize, and study information to help businesses make smart choices. It also checks if you know how to create reports, use charts, and follow rules to keep data safe and accurate. CompTIA suggests having about 1 to 2 years of experience working with data, databases, or tools like Excel, SQL, or Power BI before taking the test.

The exam has different parts, called domains. These include learning basic data concepts, preparing data, analyzing it, and creating easy-to-read reports and visualizations. Another important part is data governance, which covers keeping data secure, private, and high quality. Each section of the test has its own percentage of questions, with data analysis being the largest part at 24%.

Overall, the CompTIA Data+ exam is a good way to prove your skills if you want a career in data. It shows employers that you know how to handle data from start to finish, including collecting it, checking it for errors, and sharing results in clear ways. If you enjoy working with numbers and information, this certification can be a great step forward in your career.

CompTIA Data+ DA0-002 (V2) Logo
  • Free CompTIA Data+ DA0-002 (V2) Practice Test

  • 20 Questions
  • Unlimited
  • Data Concepts and Environments
    Data Acquisition and Preparation
    Data Analysis
    Visualization and Reporting
    Data Governance
Question 1 of 20

During a star-schema redesign, you are asked to modify the Customer dimension so that analysts can compare sales using the attributes that were valid at the time of each purchase (for example, the customer's region on the sale date). The data team is comfortable generating surrogate keys and adding effective and end-date columns, and they do not want to overwrite or delete historical rows. Which slowly changing dimension technique best satisfies these requirements?

  • Type 4 - maintain a separate history table while the main table holds the current row

  • Type 2 - insert a new row with a new surrogate key and effective-date range for every change

  • Type 3 - add additional columns to store the previous value, keeping only limited history

  • Type 1 - overwrite the existing row so only the current state is kept

Question 2 of 20

A healthcare provider loads patient encounter records into a Snowflake data warehouse every hour. The data team already has automated monitors that alert on row-count volume, table freshness, and schema changes. They now want to be warned if the relative frequency of ICD-10 diagnosis codes drifts gradually over several days-something that could skew epidemiological trend reports but would not break the pipeline outright. Which additional automated data-quality monitor would most directly address this requirement?

  • A regex validation that ensures each diagnosis_code matches the pattern [A-Z][0-9][0-9].[0-9]

  • A not-null percentage rule that triggers when the diagnosis_code column contains more than 1% NULL values

  • A statistical distribution monitor that compares current diagnosis-code frequencies with historical baselines and alerts on significant deviations

  • A referential-integrity check that confirms every encounter links to an existing patient_id in the patient table

Question 3 of 20

Your organization is building a logistics dashboard that must display parcel-tracking events from a major shipping carrier within five minutes of each scan. The carrier can share its data in one of four ways: dropping a daily CSV file on an SFTP server, exposing an authenticated REST endpoint that returns JSON, presenting parcel status on public HTML pages, or emailing a weekly XLSX report. Which data-collection approach is the most appropriate data source to meet the dashboard's requirement while minimizing additional parsing effort?

  • Call the carrier's authenticated REST API to retrieve tracking events in JSON format.

  • Use a web-scraping script to extract tracking details from the carrier's HTML pages.

  • Download the daily CSV file from the SFTP drop and load it into the dashboard.

  • Import the weekly XLSX spreadsheet that the carrier emails to the operations team.

Question 4 of 20

A data-analytics team is building a new pipeline that will run on a 100-node Apache Spark cluster. The team wants to (1) write code in the same language that Spark itself is implemented in, (2) gain immediate access to Spark's native DataFrame and Dataset APIs when a new Spark version is released, and (3) avoid the extra Py4J (or similar) serialization layer that adds cross-language overhead. According to the CompTIA Data+ list of common programming languages, which language should the team choose?

  • Java

  • Python

  • Scala

  • R

Question 5 of 20

Your team has finished analyzing customer-satisfaction metrics for fiscal Q2. According to a mandate from corporate communications, the results summary must be a static document (such as a PDF, DOC, or image) for posting on the public investor-relations site. The content must also comply with Section 508 and WCAG-AA standards to ensure board members who rely on screen-reader software can independently review it. Finally, stakeholders will download the report on both mobile and desktop devices. Given these requirements, which communication approach provides the most accessible experience?

  • Export the visual summary to a tagged PDF that includes alt text for every chart, a logical reading order, and a high-contrast color theme.

  • Record a narrated screencast walk-through of the dashboard and embed the video on the site without closed captions or transcripts.

  • Publish an interactive HTML5 dashboard in which insights appear only when users hover over elements or interpret color cues.

  • Email the underlying Excel workbook that uses red/green conditional formatting to indicate trends but contains no supporting narrative.

Question 6 of 20

A data analyst is working on a new social media analytics platform. The platform must store vast amounts of user-generated content, including posts and user profiles that have varying attributes. The system must also efficiently map and query complex user relationships, such as friendships and content interactions, to identify key influencers. Given these requirements for a flexible schema and relationship-focused queries, which type of non-relational database would be the MOST suitable choice?

  • Column-family database

  • Key-value store

  • Graph database

  • Document database

Question 7 of 20

A marketing department wants to optimize the allocation of its upcoming quarterly budget to maximize new customer acquisitions. A data analyst is tasked with developing a model that uses historical campaign performance data to recommend the most effective distribution of funds across various channels, such as social media, email, and pay-per-click advertising. Which type of statistical method is being employed to generate this recommendation?

  • Prescriptive analytics

  • Inferential analytics

  • Predictive analytics

  • Descriptive analytics

Question 8 of 20

A data analyst is preparing a large dataset of digital photographs for a computer vision model. The key requirement is to reduce the overall file size of the dataset to optimize storage and processing speed. A slight, often imperceptible, reduction in image quality is acceptable to achieve this goal. Which file format should the analyst use?

  • .tiff

  • .jpg

  • .bmp

  • .png

Question 9 of 20

A data analyst is assigned to a project involving a large, established relational database. Before writing any queries, the analyst needs to identify the tables available, the columns within each table, and the primary and foreign key relationships that link them together. Which of the following database components provides this formal structural blueprint?

  • Bridge table

  • Database schema

  • Fact table

  • Data dictionary

Question 10 of 20

A team is redesigning a SQL Server customer table that will store names from more than 40 countries, including languages that use Chinese, Arabic, and Cyrillic characters. Names can be up to 200 characters long, but most are under 50 characters. Which SQL string data type BEST satisfies the business requirements for international character support and keeps storage overhead low when the value is shorter than the maximum length?

  • nvarchar(200)

  • nchar(200)

  • varchar(200)

  • char(200)

Question 11 of 20

After today's automated refresh, a retail sales dashboard suddenly shows a Gross Profit Margin of 215 %, yet a manual SQL check of the ERP system still returns the expected 20 % range. No ETL errors were logged and other visuals appear normal. Before involving the data-engineering team, which review technique should the analyst apply first to efficiently locate the problem inside the report?

  • Ask a colleague to conduct a peer review focused on the dashboard's visual design and storytelling elements.

  • Configure an automated alert that emails the analyst whenever the profit-margin card exceeds a predefined threshold.

  • Examine the report's measures and aggregation formulas to verify that the profit-margin calculation logic is correct.

  • Perform a detailed line-by-line code review of the overnight ETL and SQL extraction scripts that populate the model.

Question 12 of 20

During an initial exploration of a CRM export that will feed a data warehouse, you notice the staging table contains two seemingly identical columns-Customer_ID and CRM_Customer_GUID. To test whether one column is a redundant copy of the other, you execute:

SELECT COUNT()
FROM   staging.crm_sales
WHERE  Customer_ID IS DISTINCT FROM CRM_Customer_GUID;

The query returns 0 rows. Based solely on this result, which conclusion is most appropriate?

  • The two columns store the same values in every record, so keeping both is redundant and one can be dropped.

  • Zero mismatches prove the table contains duplicate rows, so both columns are needed to remove those duplicates later.

  • The test shows that each column is unique but represents a different business key, so no redundancy exists.

  • Because there are no mismatches, Customer_ID must be a foreign key that references CRM_Customer_GUID, so both columns should remain.

Question 13 of 20

A data analyst is tasked with creating a deliverable for the sales leadership team to support their quarterly performance review meetings. The leadership team needs to see high-level KPIs but also wants the ability to interactively filter the data during the meeting. For example, they want to drill down into regional performance, compare product sales, and isolate data for specific sales representatives to understand the context behind the numbers. Which of the following delivery methods would BEST meet these requirements?

  • A static dashboard

  • An executive summary

  • A recurring static report

  • A dynamic dashboard

Question 14 of 20

While building a marketing dashboard, you receive newline-delimited JSON files where each record contains a nested "events" list and a "device" dictionary. You need to convert these nested structures into one flat, tabular DataFrame so the results can be loaded directly into a relational table-without writing custom loops or manual parsing code. Which pandas function provides the most straightforward way to perform this flattening step in Python?

  • pandas.melt()

  • pandas.pivot_table()

  • pandas.read_html()

  • pandas.json_normalize()

Question 15 of 20

A data analyst is preparing a report on annual employee compensation. The dataset includes salaries for all employees, from entry-level positions to C-suite executives. The analyst observes that the salary distribution is heavily skewed to the right because of a few extremely high executive salaries. To communicate the most representative measure of a 'typical' employee's earnings without being distorted by these high values, which of the following mathematical functions should the analyst primarily use?

  • Median

  • Standard Deviation

  • Mode

  • Mean

Question 16 of 20

A data analyst is working with a 'ProductNotes' text field in a sales database. This field contains user-entered notes and a product's Stock Keeping Unit (SKU). The SKUs are not in a separate, structured column. The SKU has a consistent format: it always starts with "SKU:", followed by exactly three uppercase letters, a hyphen, and five digits (e.g., 'SKU:ABC-12345'). The goal is to extract only the SKU identifier (e.g., 'ABC-12345') into a new column. Which of the following regular expressions should the analyst use to capture just the SKU identifier?

  • SKU:[A-Z]{3}-\d{5}

  • (SKU:[A-Z]{3}-\d{5})

  • SKU:([A-Z]{3}-\d{5})

  • SKU:([A-z]{3}-\d{5})

Question 17 of 20

A data analyst is tasked with creating a new sales performance dashboard that will be distributed to both internal managers and external partners. The company has recently undergone a rebranding and published a strict corporate style guide. To ensure the dashboard is immediately recognizable and aligns with the new corporate identity, which of the following design elements should be the analyst's primary focus?

  • Select the most visually complex chart types to impress the external partners.

  • Incorporate the company's logo and official color palette as defined in the style guide.

  • Use a color-blind-friendly palette with the highest possible contrast.

  • Optimize the dashboard's data queries to ensure the fastest possible load times.

Question 18 of 20

During a routine data-profiling exercise on a new data warehouse, an analyst runs a foreign-key validation between the SalesFacts fact table and the ProductDim dimension table. The profiler report shows that 97.8% of the 1.2 million SalesFacts.ProductID values are found in ProductDim.ProductID, leaving 26,769 rows that reference a non-existent product. Which data-quality dimension is this report primarily quantifying?

  • Uniqueness

  • Timeliness

  • Completeness

  • Consistency

Question 19 of 20

During the weekly data-load process, a junior data analyst runs a SQL view that casts the column quantity_sold to INT. This week the script fails and returns the runtime error:

Conversion failed when converting the varchar value 'N/A' to data type int.

The schema of the staging and target tables has not changed since the previous successful load. Which action should the analyst take first to troubleshoot the issue and prevent it from happening in future loads?

  • Increase the database server's memory allocation so the CAST operation can complete in memory.

  • Validate the source file and cleanse any non-numeric values in quantity_sold before loading the staging table.

  • Enable detailed query-plan logging on the database server to capture the statement's execution plan.

  • Rewrite the view to use a FULL OUTER JOIN instead of an INNER JOIN to eliminate rows with nulls.

Question 20 of 20

After discovering that an internal spreadsheet containing employee Social Security numbers and salary data was accidentally shared with all staff on a corporate messaging platform, you remove the file within 10 minutes. The compliance group asks you to complete the initial data-breach incident report so the data-protection officer (DPO) can decide whether outside regulators must be notified. Which one of the following details is most critical to include in that first report?

  • A complete root-cause analysis that includes system patch-management logs

  • A brief description of the personal data exposed and an approximate count of the records affected

  • The marketing team's revision history for the spreadsheet

  • A cost-benefit analysis of notifying versus not notifying affected employees