CompTIA Data+ Practice Test (DA0-002)
Use the form below to configure your CompTIA Data+ Practice Test (DA0-002). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

CompTIA Data+ DA0-002 (V2) Information
The CompTIA Data+ exam is a test for people who want to show they understand how to work with data. Passing this exam proves that someone can collect, organize, and study information to help businesses make smart choices. It also checks if you know how to create reports, use charts, and follow rules to keep data safe and accurate. CompTIA suggests having about 1 to 2 years of experience working with data, databases, or tools like Excel, SQL, or Power BI before taking the test.
The exam has different parts, called domains. These include learning basic data concepts, preparing data, analyzing it, and creating easy-to-read reports and visualizations. Another important part is data governance, which covers keeping data secure, private, and high quality. Each section of the test has its own percentage of questions, with data analysis being the largest part at 24%.
Overall, the CompTIA Data+ exam is a good way to prove your skills if you want a career in data. It shows employers that you know how to handle data from start to finish, including collecting it, checking it for errors, and sharing results in clear ways. If you enjoy working with numbers and information, this certification can be a great step forward in your career.

Free CompTIA Data+ DA0-002 (V2) Practice Test
- 20 Questions
- Unlimited
- Data Concepts and EnvironmentsData Acquisition and PreparationData AnalysisVisualization and ReportingData Governance
During a pilot with smart shelves, sensors publish an inventory event every 5 seconds to an Event Hub. Regional merchandisers rely on a Power BI dashboard to reorder items as soon as stock is low. Today the dashboard reads from a table that an ETL job overwrites with an hourly timestamped snapshot. Executives say they still miss stock-outs by nearly an hour and do not want users to click Refresh. As the data analyst, which data-versioning technique will ensure the dashboard tiles update automatically within seconds of each new event?
Retain the hourly snapshot and instruct users to force a browser refresh when needed.
Reduce the ETL schedule to every 30 minutes but keep the snapshot history for analysis.
Switch to an overnight full batch load and email merchandisers a CSV extract of inventory levels.
Configure a real-time streaming dataset that pushes each inventory event directly to the dashboard.
Answer Description
A real-time streaming (push) dataset keeps an open connection between the data source and the dashboard so each incoming event immediately redraws the visual, meeting the seconds-level SLA. Shortening the ETL to 30 minutes, keeping manual refresh, or moving to a nightly batch still rely on point-in-time snapshots, leaving minutes or hours of latency and risking missed stock-outs.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a real-time streaming dataset in Power BI?
What is an ETL process, and why is it less suitable for real-time scenarios?
How does a push dataset differ from a pull dataset in Power BI?
After last night's refresh, the ecommerce dashboard is showing every order three times, so the total revenue KPI has tripled. The visualization uses a live connection to the company's Snowflake data warehouse, and the ETL job's execution log reported no errors. No changes were published to the dashboard itself. Which action should the data analyst take first to determine whether the duplication originates in the visualization layer or in the warehouse?
Clear the dashboard's cache and republish the workbook.
Increase the BI server's memory allocation to handle larger result sets.
Post a question to the BI vendor's community forum to ask about duplicate-row rendering defects.
Run a SQL query on the warehouse table to count rows per unique order ID and check for duplicates.
Answer Description
Duplicated records can be introduced either in upstream tables (for example through an ETL load or an unexpected one-to-many join) or by the visualization layer's rendering logic. The quickest way to isolate the source is to validate the data at its source: run a direct SQL query against the fact table and count the rows (or distinct order IDs). If duplicates exist in the table, the problem is upstream; if the counts are correct in SQL but wrong in the dashboard, the issue is in the BI tool configuration. Clearing caches, tuning server memory, or opening a vendor ticket may be necessary later, but none of those steps establishes where the bad data appears, so they are not the appropriate first move.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Snowflake, and why is it used as a data warehouse?
What is an ETL job, and how does it relate to data integrity?
Why is running a SQL query the best way to check for data duplication?
A data analyst at a healthcare organization is preparing a dataset for a university research study on patient outcomes. The dataset contains sensitive Personal Health Information (PHI). To comply with privacy regulations and protect patient identities, while still providing valuable data for statistical analysis, which of the following data protection practices is the MOST appropriate to apply before sharing the dataset?
Encryption at rest
Role-based access control (RBAC)
Data masking
Anonymization
Answer Description
The correct answer is Anonymization. Anonymization is the process of removing or modifying personally identifiable information (PII) and personal health information (PHI) to ensure that the individuals who are the subjects of the data cannot be identified. This technique is essential when sharing sensitive data for external research or statistical analysis, as it protects privacy while maintaining the analytical value of the data.
- Data masking is a plausible but less appropriate choice. Masking typically replaces sensitive data with realistic but fabricated data and is most often used for internal purposes like software testing or training where the data format must be preserved. While it is a form of data obfuscation, anonymization is the more precise term for the goal of making re-identification impossible for research purposes.
- Encryption at rest protects data that is stored on a disk or in a database by making it unreadable without a decryption key. It does not protect the data once it has been decrypted and shared with the third-party researchers.
- Role-based access control (RBAC) is a method for managing who can access data within a system based on their job function. It controls access permissions but does not alter the data itself to make it safe for sharing with an external entity.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between anonymization and pseudonymization?
Why is encryption at rest insufficient for securely sharing PHI with researchers?
How does data masking differ from anonymization?
A data analyst joins a project and is tasked with creating a report from a customer database they have never used before. To properly interpret the data, the analyst needs to understand the definitions of each field, their data types (e.g., string, integer, date), and any constraints, such as 'cannot be null'.
Which of the following documents would be the MOST direct and comprehensive resource for this specific information?
Data flow diagram
Data dictionary
Data lineage report
Hierarchy structure diagram
Answer Description
The correct answer is a data dictionary. A data dictionary is a centralized repository of metadata that contains detailed information about data elements. This includes the names, business definitions, data types, formats, and constraints for each field in a database, making it the essential resource for the task described.
- A data flow diagram is incorrect because it graphically represents how data moves through a system, but it does not detail the specific attributes of each data field.
- A data lineage report is incorrect because it traces the origin, movement, and transformations of data over time, rather than providing a static definition of the data fields themselves.
- A hierarchy structure diagram is incorrect because it illustrates parent-child relationships between data entities, showing how they are organized, but it lacks the detailed field-level definitions found in a data dictionary.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a data dictionary?
What types of constraints might exist in a database, and how are they described in a data dictionary?
How does a data dictionary differ from a data flow diagram or data lineage report?
Brianna, a data analyst at an e-commerce company, must present the results of a new customer-churn model to two audiences on the same day. The first audience is the Vice President of Customer Success, who will decide whether to fund a retention campaign; the second audience is the data-engineering team that will operationalize the model. Based on user-persona type, which communication approach should Brianna use when briefing the Vice President?
Walk through each feature-engineering step, hyper-parameter tuning results and the full confusion matrix, emphasizing data-quality caveats.
Open with a one-page executive summary that highlights projected revenue at risk and the expected ROI of a retention campaign, show one clear visualization, and keep detailed model information in backup slides.
Share the raw training data and Python notebooks in a shared repository and schedule a hands-on working session to review the code line by line.
Email a CSV containing every customer's predicted churn probability and ask the Vice President to choose which customers to target.
Answer Description
C-suite stakeholders care about strategic impact and actionable outcomes. A concise executive summary that quantifies business risk and ROI, paired with a single clear visual, lets the VP grasp why the analysis matters and what decision is needed without wading through technical detail. The other options emphasize code, granular metrics or raw data that are appropriate for technical peers but are likely to overwhelm or distract an executive audience.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is an executive summary important for presenting to C-suite stakeholders?
What type of visualization would be effective for presenting to a VP?
How is communication with technical teams different from communication with executives?
A data analyst is designing a new fact table in Microsoft SQL Server to store the latitude-longitude coordinates of every retail branch. The business will run queries such as
SELECT TOP (5) BranchID
FROM dbo.Branch
WHERE @customerLocation.STDistance(Location) <= 10000; -- 10 km
to find the five closest branches to a customer anywhere in the world. The analyst needs the chosen column data type to perform built-in, accurate distance calculations that account for the curvature of the Earth without requiring custom formulas. Which SQL Server data type best meets these requirements?
varbinary(MAX)
geography
varchar(50)
geometry
Answer Description
The geography data type is designed for round-earth (ellipsoidal) coordinate systems. SQL Server implements methods such as STDistance, STBuffer and STIntersects that return geodesic results when the column is geography, so the distance between two latitude-longitude points is calculated along the Earth's surface. The geometry type, by contrast, assumes a flat (Euclidean) plane; its distance calculations become inaccurate over large areas and do not automatically use great-circle math. Storing coordinates in varchar or varbinary columns would prevent the database engine from using any spatial methods at all, leaving all calculations to the application layer. Therefore, choosing geography ensures accurate, built-in spatial querying for global distances, while the other options either yield planar errors or lack spatial awareness entirely.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between the geography and geometry data types in SQL Server?
How does the STDistance method work with the geography data type?
Why is using a varchar or varbinary column problematic for storing spatial data in SQL Server?
An e-commerce company needs to store the complete clickstream generated by its website. Each event arrives as a semi-structured JSON document whose set of fields can change often as new marketing experiments roll out. The data layer must automatically distribute writes across many inexpensive servers to absorb sudden traffic spikes, yet the analysts do not require complex joins or multi-row ACID transactions. Given these requirements, which database type is the most appropriate choice?
Spreadsheet files stored on a network share
In-memory OLAP cube
Non-relational document database
Relational database management system (RDBMS)
Answer Description
A non-relational (NoSQL) document database is designed to ingest and query JSON-like documents that may differ in structure, so it handles the rapidly evolving clickstream schema without costly ALTER TABLE operations. Document stores also include built-in sharding and replication, letting the system scale horizontally across commodity nodes to keep pace with unpredictable surges in event volume. Relational databases impose a fixed schema and usually scale vertically, making them harder to adapt and distribute for this workload. In-memory OLAP cubes are optimized for aggregated analytical queries on pre-processed data, not for high-velocity operational writes. Spreadsheets (even when saved as CSV files) lack transactional guarantees, concurrency controls and distributed scalability, so they cannot serve as a reliable clickstream data store.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is a non-relational document database suitable for handling semi-structured JSON data?
What is the role of sharding and replication in scaling a database?
Why aren't relational databases ideal for rapidly changing data schemas?
A data analyst is responsible for a weekly sales report. After a scheduled data refresh, the analyst observes that the key metric for total sales is 50% lower than the weekly average for the past year. The SQL query used for the report has not been altered, and there are no connection error messages in the reporting tool. The analyst's immediate goal is to troubleshoot this discrepancy. Which of the following actions is the most direct and effective first step to validate the data source?
Consult with the sales team to confirm if their recent sales figures were unusually low.
Run a simple aggregate query, such as
COUNT(*)
orMAX(transaction_date)
, directly against the source database table.Enable verbose logging on the reporting server to capture more detailed execution data.
Re-authenticate the database connection credentials in the business intelligence (BI) tool's data source settings.
Answer Description
The correct answer is to run a simple aggregate query directly against the source table. When a report shows unexpected values, the first step is to determine if the issue is with the source data itself or with the analysis process (like the query or BI tool). A simple query like COUNT(*)
or MAX(transaction_date)
quickly checks the completeness and freshness of the data at its source. This action isolates the data source from other variables, such as the complex reporting query or the BI tool's rendering, making it the most direct and efficient first step in troubleshooting.
- Re-authenticating the database connection is less likely to be the issue because data is being returned, just an incorrect amount. A credential failure would typically result in a connection error, not incomplete data.
- Consulting the sales team introduces business context before the technical data validation is complete. An analyst should first confirm the data's integrity before escalating to business stakeholders to avoid raising false alarms.
- Enabling verbose logging is a more advanced troubleshooting step, useful for complex query or server issues, but it is not the most direct first step for simply validating the state of the source data.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is running a simple query like COUNT(*) the first troubleshooting step?
What should I do if the COUNT(*) query shows missing or incomplete data?
How does MAX(transaction_date) help validate the data?
A data analyst is tasked with developing a new, interactive sales performance dashboard for the executive team. The requirements are high-level, with the primary request being a 'single-pane-of-glass' view of key performance indicators (KPIs). The analyst wants to gather feedback on the proposed layout, color scheme, and selection of charts before connecting to the live database and writing complex queries. Which communication approach would be most effective for this purpose?
Develop a fully functional prototype with sample data.
Schedule a presentation to verbally describe the proposed dashboard.
Create a static mock-up of the dashboard.
Write a detailed technical specification document.
Answer Description
The correct answer is to create a static mock-up. A mock-up is a visual model or replica of a final product, used to get feedback on the design, layout, and overall feel before development begins. This is the most effective approach because it allows stakeholders, particularly a non-technical audience like an executive team, to visualize the final product and provide input early in the process, preventing costly rework.
A fully functional prototype is incorrect because it involves significant development effort that is premature at this stage. Prototypes are typically created after the basic design has been agreed upon. A technical specification document is not suitable for this audience or purpose, as it details the backend processes rather than the visual design for a non-technical C-suite. A verbal presentation alone is insufficient for conveying the nuances of a visual design and can easily lead to misunderstandings.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a static mock-up?
Why is a static mock-up better than a fully functional prototype at this stage?
What is meant by 'single-pane-of-glass' in the context of dashboards?
Your data team has quantified the ROI of a recent marketing campaign and must brief two different groups: the executive leadership team during the quarterly business review and the marketing-operations analysts who will refine future campaigns. The underlying metrics and conclusions are identical, but you will create two separate slide decks to fit each audience. Which adjustment is MOST appropriate when tailoring the presentation specifically for the C-suite?
Increase chart granularity to daily spend and click-level data to show underlying variability.
Embed the full SQL script and transformation logic so leaders can audit the calculation lineage.
Open with an executive summary that highlights strategic impact and recommendations while omitting detailed methodology.
Append a comprehensive data dictionary that defines every field and value used in the analysis.
Answer Description
The C-suite typically wants a concise narrative that links the numbers to strategic objectives, financial impact, and recommended next steps. They have limited time and do not need to examine code, column definitions, or highly granular charts. Therefore, opening with a short executive summary that highlights the strategic relevance of the ROI-and omitting detailed methodology-best matches their needs. The other options focus on technical transparency (full SQL), deep data granularity (daily click-level charts), or reference documentation (comprehensive data dictionary). Those artifacts are valuable to individual contributors who will implement or audit the work, but they add unnecessary complexity for executives and risk obscuring the key business message.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is an executive summary important for a C-suite audience?
What is the difference between presenting to C-suite executives and marketing-operations analysts?
Why aren’t technical artifacts like SQL scripts appropriate for a C-suite presentation?
A team is redesigning a SQL Server customer table that will store names from more than 40 countries, including languages that use Chinese, Arabic, and Cyrillic characters. Names can be up to 200 characters long, but most are under 50 characters. Which SQL string data type BEST satisfies the business requirements for international character support and keeps storage overhead low when the value is shorter than the maximum length?
varchar(200)
nchar(200)
nvarchar(200)
char(200)
Answer Description
The column must store multilingual text, so a Unicode-capable data type is mandatory. In SQL Server, the Unicode options are nchar and nvarchar.
- nchar(200) allocates a fixed 200 characters (400 bytes) for every row, wasting space on shorter names.
- nvarchar(200) is Unicode and variable-length; it stores only the bytes required for each value plus a small length indicator, so short names consume far less space.
- char(200) and varchar(200) are non-Unicode in most collations (unless UTF-8 is explicitly enabled), so they cannot reliably hold Chinese or Arabic characters.
Therefore, nvarchar(200) is the best fit because it supports full Unicode and minimizes storage for values that are typically shorter than the declared maximum.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between char, varchar, nchar, and nvarchar in SQL?
Why is Unicode important for storing multilingual text in SQL?
What are some best practices when choosing string data types for database design?
A data analyst has created an interactive sales report in Power BI Desktop. The report needs to be shared with regional managers so they can view and interact with it online through their web browsers. Which Power BI component should the analyst use to publish and distribute the report?
Power BI Report Builder
Power BI Service
Power BI Desktop
Power BI Gateway
Answer Description
The correct answer is Power BI Service. Power BI Service is the cloud-based SaaS (Software as a Service) platform designed for sharing, collaborating on, and distributing interactive reports and dashboards. Reports are typically developed in Power BI Desktop and then published to the Power BI Service for consumption by end-users via a web browser or mobile app.
- Power BI Desktop is the free, local application used for authoring reports, performing data modeling, and creating visualizations; it is not the tool for web-based sharing and distribution.
- Power BI Report Builder is a separate tool used for creating paginated reports, which are optimized for printing or sharing as static documents like PDFs, not for interactive online reports.
- A Power BI Gateway acts as a secure bridge to allow the Power BI Service to access on-premises data sources for scheduled refreshes, but it is not the platform for publishing or sharing the reports themselves.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between Power BI Desktop and Power BI Service?
What is a Power BI Gateway, and when is it needed?
What are paginated reports, and how are they different from interactive reports?
A data analyst needs to load a large CSV file containing customer information into Python. The primary goal is to perform data cleaning, manipulation, and analysis on this tabular data, including tasks like filtering rows, grouping data, and calculating summary statistics. Which of the following Python libraries is specifically designed for these purposes, providing the DataFrame as its core data structure?
NumPy
scikit-learn
pandas
Matplotlib
Answer Description
The correct option is pandas. The pandas library is an open-source tool built on top of Python, specifically designed for data manipulation and analysis. Its primary data structure, the DataFrame, is a two-dimensional table ideal for handling structured, tabular data like that from a CSV file. Pandas provides extensive and high-performance functions for cleaning, filtering, grouping, and analyzing data, which directly matches the analyst's requirements.
- NumPy is a library for numerical computing in Python. While pandas is built on NumPy, NumPy's core data structure is the n-dimensional array, which is more suited for numerical calculations rather than the flexible, labeled data manipulation of tabular data that pandas provides.
- Matplotlib is a library for creating static, animated, and interactive visualizations in Python. An analyst would typically use Matplotlib to plot data after it has been cleaned and prepared using a library like pandas, not for the data manipulation itself.
- scikit-learn is a comprehensive library for machine learning in Python. It is used for tasks like classification, regression, and clustering. While it is used for data analysis, it is not the primary tool for the general-purpose data loading and manipulation described in the scenario.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is a DataFrame in pandas considered ideal for tabular data?
How does pandas differ from NumPy in handling data?
Can pandas be used with other Python libraries like Matplotlib or scikit-learn?
A data analyst is defining a column to store customer email addresses in a SQL Server table. Values can be anywhere from 5 to 320 characters long, contain only ASCII characters, and should be limited to 320 characters. To minimize storage while supporting this range, which column definition is the most appropriate?
varchar(320)
nvarchar(320)
varchar(MAX)
char(320)
Answer Description
VARCHAR stores variable-length, non-Unicode text and allocates only the bytes actually needed for each row (plus a small length overhead). Because most email addresses are shorter than the 320-character ceiling, a fixed-length CHAR column would waste space by padding shorter values, and NVARCHAR would double storage because it uses two bytes per character for Unicode. VARCHAR(MAX) allows strings far larger than 320 characters and has additional management overhead. Therefore, defining the column as VARCHAR(320) provides the required capacity with the smallest average storage footprint.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between VARCHAR and NVARCHAR?
Why does VARCHAR save space compared to CHAR?
When would VARCHAR(MAX) be a better choice than VARCHAR(n)?
A manufacturing company relies on three legacy desktop applications that do not expose APIs or direct database connectivity. Each night a junior analyst manually logs in, exports the day's production data to CSV, renames the files according to a strict convention, and copies them to a network share so an overnight ETL job can load them into the data warehouse. Management wants to eliminate this repetitive task without rewriting the legacy software. Which approach best matches the capabilities of robotic process automation (RPA) for this scenario?
Record the analyst's UI actions in an unattended software bot and schedule it to export, rename, and move the files every night.
Replace the legacy applications with microservice-based web APIs and rebuild the workflow around REST calls.
Train a convolutional neural network to predict missing production values and write the results directly to the warehouse.
Configure the ETL tool to query the legacy applications' databases directly through JDBC connections.
Answer Description
RPA bots are designed to mimic the exact mouse clicks and keystrokes a human performs in a graphical user interface, making them well-suited for automating repetitive, rule-based interactions with legacy systems that do not offer APIs or database access. Recording the analyst's steps and scheduling the bot to run unattended each night removes the manual effort while leaving the existing applications unchanged. Deep-learning models focus on predictive analytics, not UI automation; rebuilding the applications as microservices or wiring JDBC queries into systems that lack drivers would require extensive redevelopment and still would not use RPA technology.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Robotic Process Automation (RPA)?
Why are APIs important, and why can their absence require alternative solutions like RPA?
What are the limitations of directly accessing a legacy application's database?
You are building a monthly revenue report in a PostgreSQL 15 database. The source table orders stores the precise purchase timestamp in the column order_created_at (TIMESTAMP). Before aggregating, you must transform every timestamp so that all rows from the same calendar month are grouped under the identical key (for example, 2025-03-15 and 2025-03-31 should both become 2025-03-01 00:00:00). Which single SQL expression will perform this normalization in one step so you can immediately use it in a GROUP BY clause?
EXTRACT(month FROM order_created_at)
DATE_TRUNC('month', order_created_at)
TO_CHAR(order_created_at, 'YYYY-MM')
DATEDIFF('month', order_created_at, CURRENT_DATE)
Answer Description
DATE_TRUNC('month', order_created_at) returns a TIMESTAMP whose lesser time fields are set to their lowest values, meaning the day becomes 1 and the time becomes 00:00:00. Grouping by this expression therefore buckets every timestamp from the same calendar month under one identical value.
EXTRACT(month FROM order_created_at) returns only the numeric month (1-12); grouping by it would mistakenly merge data from the same month across different years. TO_CHAR(order_created_at,'YYYY-MM') formats the value as a string, not a timestamp, and does not reset the day or time. DATEDIFF('month', order_created_at, CURRENT_DATE) is not available in PostgreSQL and, even if it were, would yield an integer offset rather than a normalized date. Hence, DATE_TRUNC is the only option that meets the requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does the DATE_TRUNC function do in PostgreSQL?
Why is EXTRACT not suitable for grouping timestamps by month?
What is the difference between TO_CHAR and DATE_TRUNC?
A data analyst has created a line chart to display the quarterly sales performance of four different product lines over the last three years. Each product line is represented by a uniquely colored line. During a review, a manager notes that it is impossible to determine which product line corresponds to which colored line. To resolve this ambiguity and make the chart interpretable, which design element must the analyst add?
Data labels
A legend
Gridlines
A title
Answer Description
The correct answer is a legend. A legend serves as a key for a chart, identifying what the different colors, patterns, or symbols used in the visualization represent. In this scenario, with four different product lines each shown as a uniquely colored line, a legend is essential to connect each color to its corresponding product line, making the chart understandable.
- Data labels are incorrect because they display the specific value of individual data points on a chart. While they add precision, they do not solve the core problem of identifying which product line each colored line represents.
- A title is incorrect because it describes the overall purpose and content of the chart (e.g., 'Quarterly Sales by Product Line') but does not explain the individual elements within it.
- Gridlines are incorrect as they are background lines that help the viewer's eye align data points with the values on an axis. They provide a frame of reference for values but do not help in distinguishing between data categories.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the primary purpose of a legend in data visualization?
How do data labels differ from a chart legend?
When would gridlines be important in a chart?
An online retailer has two AI-related initiatives:
Project A uses an AI service to automatically detect product names, competitor brands and overall sentiment in thousands of free-text customer reviews so that marketing dashboards update in near real time.
Project B aims to remove a nightly manual task in which an analyst opens a legacy desktop application, exports a CSV file, renames it and uploads it to a cloud folder.
Which combination of AI concepts best matches Project A and Project B, respectively?
Generative AI for Project A and deep learning for Project B
Deep learning classification for Project A and generative AI for Project B
Robotic process automation (RPA) for Project A and large language model (LLM) fine-tuning for Project B
Natural language processing (NLP) for Project A and robotic process automation (RPA) for Project B
Answer Description
Project A involves extracting structured information (entities and sentiment) from unstructured text. That task is a classic use case for natural language processing (NLP) techniques such as named-entity recognition and sentiment analysis. Project B, in contrast, is a deterministic, rule-based sequence of user-interface actions across existing applications; it does not require the system to "learn" from data, only to repeat prescribed steps, which is exactly what robotic process automation (RPA) is designed to do. The other pairings mis-align the concepts: generative AI focuses on creating new content rather than extracting entities; deep learning is a broad modeling approach rather than a process-automation tool; RPA does not perform text understanding; and large language models are specialized NLP architectures, not UI bots for file exports.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Natural Language Processing (NLP)?
What is Robotic Process Automation (RPA)?
How does sentiment analysis work in NLP?
Your team has finished analyzing customer-satisfaction metrics for fiscal Q2. According to a mandate from corporate communications, the results summary must be a static document (such as a PDF, DOC, or image) for posting on the public investor-relations site. The content must also comply with Section 508 and WCAG-AA standards to ensure board members who rely on screen-reader software can independently review it. Finally, stakeholders will download the report on both mobile and desktop devices. Given these requirements, which communication approach provides the most accessible experience?
Publish an interactive HTML5 dashboard in which insights appear only when users hover over elements or interpret color cues.
Email the underlying Excel workbook that uses red/green conditional formatting to indicate trends but contains no supporting narrative.
Export the visual summary to a tagged PDF that includes alt text for every chart, a logical reading order, and a high-contrast color theme.
Record a narrated screencast walk-through of the dashboard and embed the video on the site without closed captions or transcripts.
Answer Description
Exporting the visuals to a properly tagged PDF that uses high-contrast colors and includes concise alt text meets all stated constraints. A tagged PDF carries a logical reading order and metadata that screen readers can parse, while alt text conveys the meaning of charts for users who cannot see them. High-contrast colors improve legibility for low-vision readers and satisfy WCAG contrast thresholds.
The interactive HTML5 dashboard is disallowed because the site only accepts static files, and hover-only interactions are not keyboard or screen-reader friendly. The color-coded Excel workbook fails WCAG 1.4.1 because color is the sole cue and offers no alt text or meaningful structure. The narrated screencast, lacking captions or a text alternative, excludes deaf or hard-of-hearing users and cannot be parsed by screen readers, so it does not satisfy the accessibility requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Section 508 and why is it important?
What does WCAG-AA compliance mean?
What are tagged PDFs and why are they essential for accessibility?
A regional retail chain tracks point-of-sale data that is loaded into its data warehouse every night by 04:00. The sales director wants store managers to open an existing Power BI dashboard at 08:00 each Monday and immediately see a summary of the previous week's results without having to click a refresh button or run a query. Which delivery approach best meets this requirement while minimizing manual effort?
Export the dashboard as a static PDF every Friday afternoon and email it to all store managers.
Switch the dataset to DirectQuery so the dashboard streams live transactions whenever someone opens it.
Provide an ad-hoc report template that managers must run and filter themselves each Monday morning.
Configure a scheduled refresh that runs at 05:00 every Monday so the dashboard is updated before managers log in.
Answer Description
Because the managers need the same metrics on a predictable cadence (every Monday) and the source data is already finalized overnight, the most efficient solution is to set up an automated, recurring dashboard refresh that runs early Monday morning. A scheduled refresh guarantees the dashboard opens with fresh data and eliminates the need for users to trigger an update. Real-time DirectQuery is unnecessary for once-per-week consumption and can add gateway overhead; an ad-hoc template still requires each manager to generate the report manually; exporting a static PDF on Friday would leave three days of weekend sales missing by Monday.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a scheduled refresh in Power BI?
Why is DirectQuery not suitable for this scenario?
What are the limitations of static PDF exports for dashboards?
Woo!
Looks like that's it! You can go back and review your answers or click the button below to grade your test.