CompTIA Data+ Practice Test (DA0-002)
Use the form below to configure your CompTIA Data+ Practice Test (DA0-002). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

CompTIA Data+ DA0-002 (V2) Information
The CompTIA Data+ exam is a test for people who want to show they understand how to work with data. Passing this exam proves that someone can collect, organize, and study information to help businesses make smart choices. It also checks if you know how to create reports, use charts, and follow rules to keep data safe and accurate. CompTIA suggests having about 1 to 2 years of experience working with data, databases, or tools like Excel, SQL, or Power BI before taking the test.
The exam has different parts, called domains. These include learning basic data concepts, preparing data, analyzing it, and creating easy-to-read reports and visualizations. Another important part is data governance, which covers keeping data secure, private, and high quality. Each section of the test has its own percentage of questions, with data analysis being the largest part at 24%.
Overall, the CompTIA Data+ exam is a good way to prove your skills if you want a career in data. It shows employers that you know how to handle data from start to finish, including collecting it, checking it for errors, and sharing results in clear ways. If you enjoy working with numbers and information, this certification can be a great step forward in your career.
Free CompTIA Data+ DA0-002 (V2) Practice Test
Press start when you are ready, or press Change to modify any settings for the practice test.
- Questions: 20
- Time: Unlimited
- Included Topics:Data Concepts and EnvironmentsData Acquisition and PreparationData AnalysisVisualization and ReportingData Governance
Free Preview
This test is a free preview, no account required.
Subscribe to unlock all content, keep track of your scores, and access AI features!
Your retail organization needs to share a daily customer-transaction table with an external machine-learning vendor. The vendor must be able to join transactions that belong to the same shopper across multiple days, but must never receive personally identifiable information such as names, emails or card numbers. Which data-sharing technique best meets both the privacy and analytical requirements?
Encrypt the entire dataset with AES-256 and email the decryption key to the vendor.
Replace each email address with a deterministic one-way hash before sending the file.
Remove all identifying columns and aggregate the data to store totals per store per day.
Mask sensitive fields by replacing the middle characters of names and card numbers with asterisks.
Answer Description
Using a deterministic one-way hash (or similar tokenization approach) on customer identifiers converts each value into an irreversible surrogate that is identical every time the same input appears. This keeps PII out of the file while still allowing the vendor to join rows that refer to the same shopper. Encrypting the entire file and then handing over the key simply reveals the raw PII once it is decrypted. Aggregating the data removes the ability to link individual shoppers across days, preventing the required analyses. Partially masking names or card numbers leaves enough of the original values to risk re-identification and still exposes other direct identifiers such as email addresses.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a deterministic one-way hash?
Why is deterministic hashing better than encryption for data sharing?
How does deterministic hashing prevent re-identification of PII?
Your organization is designing a star schema for its e-commerce data warehouse. The model includes a very large Sales fact table that must join to a Product dimension table containing thousands of descriptive attributes (brand, category, size, color, etc.). To follow dimensional-modeling best practices and minimize storage and join costs in the fact table, which primary-key strategy is most appropriate for the Product dimension table?
An auto-incrementing integer surrogate key generated within the data warehouse
A concatenated natural key made of SupplierID and ManufacturerPartNumber
A composite key of ProductSKU combined with EffectiveStartDate and EffectiveEndDate
A globally unique identifier (GUID) assigned by the e-commerce application
Answer Description
Dimensional modeling guidelines recommend assigning each dimension row a meaningless, sequential surrogate key-typically a 4-byte integer-rather than relying on natural business keys or wide composite keys. A compact surrogate key keeps the fact table's foreign-key columns small (saving space and index overhead), speeds join performance, shields the warehouse from changes to source-system codes, and is mandatory when tracking slowly changing dimension history. Natural keys, GUIDs and composite keys all consume more space, may change unexpectedly, and complicate surrogate-key lookups for historical versioning, so they are not preferred choices for a dimension table in a star schema.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a surrogate key in dimensional modeling?
Why is a surrogate key better than a GUID in a star schema?
What are slowly changing dimensions, and how do surrogate keys help manage them?
A data analyst is responsible for exporting a dataset containing sensitive customer financial information from an on-premises production database to a cloud-based analytics platform. According to the company's data governance policy, all data must be protected from interception and unauthorized viewing while it is being transferred over the network. Which of the following data protection practices is the most critical for meeting this specific requirement?
Encryption in transit
Role-based access control
Encryption at rest
Data masking
Answer Description
The correct answer is encryption in transit. This practice specifically addresses the protection of data as it moves from one location to another, such as across the internet or a private network. Technologies like TLS (Transport Layer Security) are used to create a secure, encrypted channel to prevent eavesdropping or interception during the transfer, which directly addresses the requirement in the scenario.
- Encryption at rest is incorrect because it applies to data that is stored on a physical or virtual medium, like a hard drive or in a database, not data that is actively moving across a network. While important for securing the data before and after the transfer, it does not protect the data during the transfer.
- Data masking is incorrect as it involves obscuring specific data elements by replacing them with fictitious but realistic-looking data. This is useful for creating non-production datasets for testing or development but does not encrypt the entire dataset to protect it from interception during transmission.
- Role-based access control (RBAC) is an incorrect choice because it is an authorization mechanism that defines who can access specific resources based on their job function. It controls access to the endpoints (the database and the cloud platform) but does not protect the data itself from being intercepted as it travels between those endpoints.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is encryption in transit, and how does it work?
How is encryption in transit different from encryption at rest?
What is the role of TLS in securing data in transit?
Your organization stores shipping records in a SQL Server table named shipments
with the columns order_id
(INT), order_date
(DATETIME), and delivery_date
(DATETIME). To monitor logistics performance, you must create a field that shows how many calendar days each order spent in transit so the data team can later filter for any orders that took more than 7 days. You are not allowed to overwrite existing columns. Which SQL statement meets the requirement by creating an appropriate derived variable?
SELECT order_id, DATEADD(day, order_date, delivery_date) AS shipping_days FROM shipments;
SELECT order_id, DATEDIFF(day, order_date, delivery_date) AS shipping_days FROM shipments;
SELECT order_id, delivery_date - order_date AS shipping_days FROM shipments;
SELECT order_id, DATEDIFF(day, delivery_date, order_date) AS shipping_days FROM shipments;
Answer Description
A derived variable (also called a calculated or derived column) is a new field produced by applying an expression to existing columns and leaving the source data intact. In SQL Server the DATEDIFF()
function returns the integer difference between two dates in the specified unit, so DATEDIFF(day, order_date, delivery_date)
yields the total transit time for each row. Aliasing that result as shipping_days
adds the new field without altering order_date
or delivery_date
, satisfying the scenario.
The second choice misuses DATEADD
, which adds an interval to a date instead of computing a difference. The third choice subtracts two datetimes directly; in SQL Server this produces a decimal in days only when both are FLOAT
expressions, not a clean integer, and it is unreliable across platforms. The fourth choice reverses the start and end arguments in DATEDIFF
, resulting in negative values for on-time shipments and thus failing the business rule.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
How does the DATEDIFF function work in SQL Server?
What is a derived column, and how is it different from a regular column?
Why is DATEADD inappropriate for calculating date differences in this scenario?
A data analyst is working with a data extract from a legacy system saved as product_list.txt
. Which statement accurately describes a primary characteristic of a .txt
file that the analyst must account for when preparing to load it into a database?
The file organizes data into a nested, hierarchical structure of objects and arrays.
The file is a binary format that requires a specific database driver or proprietary software to be read correctly.
The file format itself does not store metadata or formatting, so the analyst must infer the data's structure, such as the delimiter and character encoding.
The file inherently contains schema information, including data types and column headers, in an unformatted header block.
Answer Description
The correct answer is that a .txt
file format does not store metadata or formatting, requiring the analyst to determine the data's structure. A .txt
file is a plain text file, meaning it contains only text characters without any embedded information about formatting (like bold or italics), or structure (like defined columns). When ingesting data from a .txt
file, the analyst must inspect the file to understand its layout, such as whether the data is separated by commas, tabs, or another delimiter, and determine the correct character encoding (e.g., UTF-8, ANSI) to read the characters correctly.
The other options are incorrect. Describing data in a nested, hierarchical structure is characteristic of a .json
or .xml
file. A file that is a binary format requiring a specific driver is typical of proprietary database files, not universally readable text files. Lastly, while a .txt
file might have a header row, this is simply the first line of text and is not inherent schema information enforced by the file format itself.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is character encoding in a .txt file?
How can you identify the delimiter in a .txt file?
Why doesn't a .txt file store metadata or schema information?
A data analyst is designing a table to store user login events for a new global web application. The application will have users in multiple time zones, and it is critical that all login times are recorded in a standardized way to allow for accurate chronological analysis, regardless of the user's location. Which data type is the MOST appropriate for the column that will store the login event time?
Datetime
Integer
Timestamp
String
Answer Description
The correct answer is Timestamp. The TIMESTAMP
data type is ideal for global applications because it stores date and time values by converting them from the client's time zone to a standardized format, typically Coordinated Universal Time (UTC). When the data is retrieved, it can be converted back to the local time zone of the session, ensuring all events are comparable on a single timeline.
DATETIME
is incorrect because it typically stores the literal date and time value ('YYYY-MM-DD HH:MM:SS') without any time zone information. This can lead to ambiguity when data is collected from multiple time zones, making accurate chronological analysis difficult.String
(e.g., VARCHAR) is incorrect because storing dates as text is inefficient, prone to formatting errors, and makes date/time calculations and sorting computationally expensive and complex.Integer
is incorrect because while one could store time as a Unix timestamp (an integer representing seconds since the epoch), theTimestamp
data type is specifically designed for this purpose, offering better readability and more powerful built-in functions for date and time manipulation within the database.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is UTC used when storing timestamps for global applications?
What is the difference between the TIMESTAMP and DATETIME data types?
Why is storing time as text (String) or an Integer not ideal for timestamps?
A data analyst is creating a Microsoft SQL Server table to store customer feedback submitted in English, Spanish, Japanese, and Russian. The instance is running SQL Server 2017 with its default (non-UTF-8) collation. Which data type should be assigned to the FeedbackText column so that all characters are saved without corruption or loss of information?
CLOB
NVARCHAR
CHAR
VARCHAR
Answer Description
NVARCHAR is the appropriate choice because it always stores Unicode text using UTF-16, guaranteeing support for virtually all modern languages regardless of the server's code page or collation.
- VARCHAR requires a code-page-specific or UTF-8 collation. When the server is using the default non-UTF-8 collation (as in this scenario), characters outside that code page-such as Japanese or Cyrillic-may be truncated or converted to question marks.
- CHAR is a fixed-length, non-Unicode type under the same limitations as VARCHAR and also wastes space for variable-length feedback.
- CLOB (Character Large Object) is intended for very large blocks of text and is not a standard SQL Server column type; in SQL Server the modern equivalent would be NVARCHAR(MAX). For typical feedback strings, NVARCHAR or NVARCHAR(MAX) is more flexible.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why does NVARCHAR use Unicode, and why is it important for storing text?
What is the difference between NVARCHAR and VARCHAR in SQL Server?
What is the purpose of collation in SQL Server, and how does it affect VARCHAR and CHAR?
A data analyst is preparing a 250 000-row customer data set to train a supervised churn-prediction model. The target column, Churn_Flag, contains Yes/No values for 248 700 customers, while the remaining 1 300 rows have NULL in that column only; every feature in those 1 300 rows is otherwise complete and within expected ranges. Exploratory checks show that dropping 1 300 records will not materially change the class balance or statistical power of the model. The machine-learning library being used will raise an error if the target variable is missing. Which data-cleansing technique is MOST appropriate for handling the 1 300 affected rows before modeling?
Impute each missing Churn_Flag with the most common class so the overall distribution is preserved.
Bin Churn_Flag into broader categories and keep the rows to maximize training data size.
Delete the 1 300 rows that have a NULL value in Churn_Flag before training the model.
Apply min-max scaling to the numeric features so the algorithm can ignore the NULL labels.
Answer Description
Because the missing values occur in the target variable-not in the predictor features-the rows cannot contribute to supervised learning. Imputing or transforming the missing target would inject fabricated labels and risk corrupting the model. Binning or scaling features does nothing to resolve the missing label, and the library will still fail. Given that the affected subset represents only 0.52 % of the data and its removal does not bias the class distribution, listwise deletion (dropping those rows) is the proper cleansing step. Imputing the mode would create false churn outcomes; scaling features leaves the NULLs untouched; and binning the target is impossible without a value to bin, so those choices are incorrect.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is it important to delete rows with NULL values in the target variable instead of imputing them?
What does class balance mean in the context of machine learning?
How does listwise deletion impact the statistical power of a machine learning model?
A data analyst needs to combine two large fact tables named invoice_2024 and invoice_2025, each holding the same 15 columns and more than three million rows. Because invoice numbers are sequentially generated and never reused, the business has confirmed there will be no duplicate records across the two tables. The merged result will feed year-over-year revenue dashboards, so the SQL written for the merge step must minimize extra processing such as sorting or de-duplication. Which SQL construct should the analyst use to build the consolidated data set with the least overhead?
Use a UNION operator so that only distinct rows are returned.
Create an INNER JOIN between the tables on invoice_number.
Merge the tables with a FULL OUTER JOIN on invoice_number.
Use a UNION ALL operator to append invoice_2025 to invoice_2024.
Answer Description
UNION ALL simply appends the rows from one query result set to another without performing the duplicate-elimination sort that a plain UNION must execute. When the analyst knows that duplicate rows cannot exist-as in separate yearly invoice tables-UNION ALL avoids the expensive sort/merge phase and therefore runs faster on large row counts. Using UNION would trigger an unnecessary DISTINCT operation; either type of JOIN would produce a wider result that repeats columns and requires join logic rather than a vertical append, adding complexity without benefit in this scenario.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between UNION and UNION ALL in SQL?
Why are joins like INNER JOIN or FULL OUTER JOIN not suitable for this scenario?
What scenarios would be better suited for the UNION operator instead of UNION ALL?
During a quarterly audit, a data warehouse administrator discovers that an intern who moved from the marketing team to the risk team still belongs to both the Marketing_Analyst and Risk_Analyst database roles, giving her access to tables she no longer needs. The administrator wants to adjust the organization's role-based access control (RBAC) policy so that users keep only the permissions required for their current duties and nothing more. Which RBAC principle should the administrator emphasize to prevent this kind of permission creep?
Attribute-based access control
Least privilege
Mandatory access control
Separation of duties
Answer Description
The principle of least privilege states that each user should be granted only the minimum permissions necessary to perform current job functions. If membership in the Marketing_Analyst role had been removed when the intern changed jobs, she would not have retained unneeded access to marketing tables.
Separation of duties focuses on splitting sensitive tasks among different roles to avoid fraud, not on pruning excess permissions. Mandatory access control is a different, highly restrictive model enforced centrally through security labels, and attribute-based access control grants rights dynamically based on attributes rather than predefined roles. Therefore, least privilege is the only principle that directly addresses the problem of users accumulating more privileges than they require.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the principle of least privilege in RBAC?
How does role-based access control (RBAC) differ from attribute-based access control (ABAC)?
What is permission creep, and why is it a security risk?
A data-analytics team is building a new pipeline that will run on a 100-node Apache Spark cluster. The team wants to (1) write code in the same language that Spark itself is implemented in, (2) gain immediate access to Spark's native DataFrame and Dataset APIs when a new Spark version is released, and (3) avoid the extra Py4J (or similar) serialization layer that adds cross-language overhead. According to the CompTIA Data+ list of common programming languages, which language should the team choose?
Scala
Java
Python
R
Answer Description
Scala is the language in which Apache Spark is primarily written, so developing Spark jobs in Scala avoids the Python-to-JVM bridge that Py4J introduces, removes an entire layer of serialization overhead, and lets developers use the DataFrame and strongly typed Dataset APIs as soon as they are available. Python, R, and Java can all be used with Spark, but they either rely on a bridging layer (Py4J for Python, rJava for R) or lag slightly behind Scala in API updates. Java is close, yet Spark's higher-level APIs and examples are maintained first in Scala, making Scala the best fit for the stated requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is Scala the preferred language for Apache Spark development?
What is the purpose of Py4J in Apache Spark?
What are the differences between DataFrame and Dataset APIs in Spark?
A U.S.-based retailer wants to replicate its PostgreSQL production database, which stores personal data about European Union customers, to a cloud analytics cluster located in Singapore. To satisfy the jurisdictional requirement portion of data-compliance planning, which action should the data team perform first?
Validate that the destination cluster enforces column-level encryption for all sensitive fields.
Ensure the replication job meets the required recovery-point and recovery-time objectives.
Confirm that transferring EU personal data to Singapore is permitted and implement an approved cross-border transfer mechanism (for example, Standard Contractual Clauses).
Update the data dictionary to reflect schema changes introduced in the analytics environment.
Answer Description
Jurisdictional requirements focus on laws that apply because of where data is collected, stored, or accessed. Under the GDPR, Singapore is a "third country," so exporting EU personal data there is lawful only if an approved cross-border transfer mechanism-such as an adequacy decision, Standard Contractual Clauses, or Binding Corporate Rules-has been validated. Verifying and implementing that legal basis directly addresses the geographic (jurisdictional) aspect of compliance. The other options deal with technical safeguards (encryption), disaster-recovery timing, or metadata maintenance; none of those, by themselves, satisfy location-based legal obligations.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are Standard Contractual Clauses (SCCs) under GDPR?
What is the legal significance of 'third countries' under GDPR?
What is the difference between jurisdictional requirements and technical safeguards for data compliance?
A data analyst is preparing a dataset for a customer segmentation project that will use a distance-based clustering algorithm. The dataset includes the features annual_income
, with values ranging from 30,000 to 180,000, and customer_satisfaction_score
, with values on a scale of 1 to 5. The analyst is concerned that the annual_income
feature will disproportionately influence the clustering results due to its much larger numeric range. Which data transformation technique should be used to prevent this issue and ensure all features contribute more equitably to the analysis?
Parsing
Binning
Imputation
Scaling
Answer Description
The correct answer is Scaling. Scaling, which includes techniques like normalization (Min-Max scaling) and standardization (Z-score scaling), is used to transform the values of numeric features to a similar scale. This is crucial for distance-based algorithms, such as k-means clustering, where features with larger ranges can dominate the distance calculations and skew the results. By scaling annual_income
and customer_satisfaction_score
to a common range (e.g., 0 to 1), the analyst ensures both features contribute more equitably to the clustering model.
- Binning is incorrect because it involves grouping a range of continuous values into a smaller number of discrete 'bins' or categories. This is used to simplify data or convert it to a categorical format, not to address the influence of different numeric scales in a distance-based algorithm.
- Imputation is the process of replacing missing values in a dataset. The scenario does not mention any missing data, making this technique irrelevant to the problem described.
- Parsing is the process of converting unstructured or semi-structured data (like text strings or log files) into a structured format. This is not applicable to transforming the scale of existing numerical features.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is scaling important for distance-based algorithms like k-means clustering?
What is the difference between normalization and standardization in scaling?
When should you avoid using binning instead of scaling in data analysis?
A hospital analytics team must share a large set of patient treatment records with an external university researcher. Under GDPR Recital 26, the data will fall outside the scope of data-protection law only when the data subjects are not identifiable by any means that are reasonably likely to be used. Which data-preparation strategy BEST achieves true anonymization before release?
Remove names and mask the last four digits of each Social Security number, but keep full postcode and exact date of birth for analysis.
Encrypt the entire data set with AES-256 and give the researcher only the ciphertext while the hospital retains the decryption key.
Generalize quasi-identifiers (for example, turn exact birth dates into five-year age bands) and publish only aggregated records in which each row represents at least 50 patients, with no key linking back to individuals.
Replace every patient identifier with a random string and store the mapping table on a secure internal server.
Answer Description
GDPR regards data as anonymous only when neither the controller nor any other party can single out, link to or infer the identity of an individual using means reasonably likely to be employed. The strategy that first generalizes indirect identifiers (such as converting exact dates of birth into five-year age bands) and then aggregates the records so that every published row summarises at least 50 patients-with no retained mapping key-removes both direct and indirect identifiers and eliminates the possibility of re-identification, satisfying the anonymization test in Recital 26. Replacing patient IDs with random tokens still allows re-identification through the lookup table, so it is merely pseudonymization. Encrypting the file is a security measure; because the controller holds the decryption key, the underlying personal data remain accessible and therefore subject to GDPR. Simply masking names and a portion of Social Security numbers leaves quasi-identifiers (full postcode plus exact date of birth) that can single out individuals, so the data are not anonymous.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between anonymization and pseudonymization under GDPR?
What is a quasi-identifier, and how does it impact data anonymization?
Why is generalizing quasi-identifiers and aggregating data effective for anonymization?
A data analyst at an e-commerce company is tasked with analyzing customer transaction data stored in a database. The dataset contains the customer's full name, shipping address, and the complete, unencrypted 16-digit Primary Account Number (PAN). To adhere to the Payment Card Industry Data Security Standard (PCI DSS), which of the following is the most critical measure to apply to the PAN data at rest?
Ensure the data is sent from the database to the analyst's tool over an encrypted channel.
Anonymize the customer name and shipping address fields in the table.
Render the PAN unreadable using an approved method like truncation, tokenization, or encryption.
Implement role-based access control to limit who can query the transaction table.
Answer Description
The correct answer is to render the PAN unreadable using an approved method like truncation, tokenization, or encryption. PCI DSS Requirement 3 specifically mandates that stored Primary Account Numbers (PAN) must be protected. Requirement 3.4 states that the PAN must be rendered unreadable wherever it is stored. Acceptable methods include one-way hashing, truncation (storing only a segment of the PAN, such as the first six and last four digits), tokenization (replacing the PAN with a non-sensitive value), and strong encryption.
- Implementing role-based access control is a necessary part of PCI DSS (Requirement 7), but it is not sufficient on its own. If the data itself is stored unencrypted, a breach of an authorized user's credentials would still expose the plaintext PANs.
- Encrypting data in transit (Requirement 4) is also a key part of PCI DSS, but it does not address the risk of the data being stored insecurely at rest in the database.
- Anonymizing other personally identifiable information like the name and address is a good general data privacy practice but does not address the specific and primary PCI DSS requirement to protect the PAN itself.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is PCI DSS and why is it important?
What are the approved methods to render a PAN unreadable?
Why is encryption of data at rest different from encryption in transit?
A data analyst is defining a column to store customer email addresses in a SQL Server table. Values can be anywhere from 5 to 320 characters long, contain only ASCII characters, and should be limited to 320 characters. To minimize storage while supporting this range, which column definition is the most appropriate?
varchar(MAX)
char(320)
varchar(320)
nvarchar(320)
Answer Description
VARCHAR stores variable-length, non-Unicode text and allocates only the bytes actually needed for each row (plus a small length overhead). Because most email addresses are shorter than the 320-character ceiling, a fixed-length CHAR column would waste space by padding shorter values, and NVARCHAR would double storage because it uses two bytes per character for Unicode. VARCHAR(MAX) allows strings far larger than 320 characters and has additional management overhead. Therefore, defining the column as VARCHAR(320) provides the required capacity with the smallest average storage footprint.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between VARCHAR and NVARCHAR?
Why does VARCHAR save space compared to CHAR?
When would VARCHAR(MAX) be a better choice than VARCHAR(n)?
During routine monitoring, a data analyst observes a surge of suspicious SQL queries from an external address targeting the company's production database that stores customer PII. The organization's incident-response procedure is based on NIST SP 800-61/800-171 guidance. Which action should the analyst take first to satisfy security‐incident reporting requirements and preserve evidence?
Notify the designated incident-response contact immediately with a timestamp, affected system details, and preliminary indicators.
Wait for the forensics team to confirm data exfiltration, then report the event in the next change-control meeting.
Shut down the database server and delete temporary files that may contain traces of the attack.
Apply emergency patches to the database server to close the suspected vulnerability before telling anyone.
Answer Description
Best-practice frameworks such as NIST SP 800-61 and SP 800-171 state that when potential security incidents are detected, the event must be documented and reported through the organization's defined channel before containment or eradication begins. Prompt notification supplies the incident-response team with the information they need to triage, coordinate containment and forensics, and ensures that critical evidence (for example, logs, memory images, or query traces) is not destroyed. Patching a live system, powering it down, or delaying notification until after a full forensic review can compromise evidence or violate required reporting time frames.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is NIST SP 800-61 and SP 800-171?
Why is immediate reporting critical during a potential security incident?
What could happen if the analyst delayed reporting or acted improperly?
A company stores its operational data in a MySQL 8.0 server. You have been asked to reverse-engineer the existing schema into an entity-relationship (EER) diagram, make minor table changes with a visual editor, and run ad-hoc SQL queries in the same workspace. Which tool from the CompTIA Data+ list is specifically designed for this end-to-end database modeling and administration task?
Power BI Desktop
pandas (Python library)
RStudio
MySQL Workbench
Answer Description
MySQL Workbench is purpose-built for designing and administering MySQL databases. It can reverse-engineer an existing schema into an EER diagram, let you edit tables and relationships in a graphical canvas, and includes an integrated SQL editor for running queries-all within one interface. RStudio is an IDE for R/Python code, not a database design tool. Power BI Desktop focuses on building dashboards and does not create or modify database schemas. The pandas library supplies data-wrangling functions inside Python but offers no visual modeling or direct database administration features.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is an EER diagram, and why is it used?
What features of MySQL Workbench make it suited for database administration?
Why can't Power BI or pandas be used for database schema design?
A reporting analyst is designing a relational table to archive detailed customer feedback notes. Each note can contain up to 200 KB (about 200,000 characters) of plain Unicode text, and analysts will need to run SQL string functions-such as LIKE pattern searches and SUBSTRING extraction-directly against the stored content. Which data type should the analyst assign to the column that stores the feedback notes so the requirement is satisfied without imposing an unnecessary size limit?
CLOB (Character large object)
VARCHAR(255)
BLOB (Binary large object)
FLOAT
Answer Description
The Character Large Object (CLOB) data type is intended for very large blocks of text-often up to several gigabytes-while still allowing the database engine to treat the content as character data. Because the data remain in a character-encoded form, built-in SQL text operations (e.g., LIKE, SUBSTRING) can be applied. A BLOB, by contrast, stores raw binary data and does not natively support text functions. Standard VARCHAR columns are limited to a few thousand bytes in most platforms (for example, 2-4 KB) and therefore cannot hold 200 KB of text. Numeric types such as FLOAT are meant for numbers, not text. Hence, CLOB is the most appropriate choice.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the primary purpose of a CLOB data type?
How does a CLOB differ from a BLOB in relational databases?
Why is VARCHAR not suitable for storing 200 KB of text?
A data analyst is working with a data file named telemetry.dat
which was exported from a proprietary industrial control system. When the analyst attempts to open the file with a standard text editor, it displays a mix of readable text headers and large sections of unreadable, non-printable characters. What is the most likely characteristic of this file?
It is a universally standard Character Large Object (CLOB) file that requires a specific database driver to open.
It uses a proprietary or custom structure defined by the industrial control system's software.
It is a plain text file, and the unreadable characters indicate that the file was corrupted during export.
It is a compressed file archive, similar to a .zip file, containing multiple log files.
Answer Description
The correct answer is that the file uses a proprietary or custom structure defined by the creating software. The .dat
file extension is short for 'data' and is used as a generic container for information specific to an application. It has no single, standardized format. The scenario described, with a mix of readable text and unreadable binary characters, is a common indicator of a proprietary format that requires specific software or knowledge of its unique structure to be parsed correctly.
- The suggestion that the file is corrupted is a plausible but less likely primary explanation. Proprietary files are often intentionally designed with mixed text and binary content.
- While some applications might use a
.dat
extension for a compressed archive, it is not the default meaning. Also, a standard compressed file would typically appear as entirely unreadable binary data in a text editor, not a mix with clear headers. - A Character Large Object (CLOB) is a data type used within a database to store large amounts of text, not a standard file format itself, making this option incorrect.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why are proprietary file formats necessary for certain software systems?
How can a data analyst open and use data from a proprietary file format?
What precautions should be taken when working with `.dat` files?
Gnarly!
Looks like that's it! You can go back and review your answers or click the button below to grade your test.