CompTIA DataX DY0-001 (V1) Practice Question

A financial services company is analyzing a high-velocity stream of credit card transactions to build a real-time fraud detection model. The data captures each transaction as a discrete, atomic event. From a data engineering perspective, which statement most accurately identifies a primary challenge in preparing this raw transactional data for a predictive model?

The data-generating process is subject to significant self-selection bias, which must be corrected using stratified sampling before the data is considered representative.
Processing the high volume requires specialized hardware like Tensor Processing Units (TPUs), which are specifically designed for ingesting and parsing sequential event data.
The data is typically unstructured, requiring complex natural language processing (NLP) to extract entities before numerical analysis can be performed.
The atomic nature of individual events requires feature engineering through time-based windowing to create behaviorally relevant aggregates (e.g., transaction frequency, rolling spend averages) that provide predictive context.

Report Issue

Answer Description

The correct answer explains that the fundamental challenge with event-based transactional data is that individual events lack predictive context. To make the data useful for a machine learning model, especially for tasks like fraud detection, it is necessary to perform feature engineering by creating aggregates over time-based windows. This process creates stateful features, such as transaction frequency or rolling averages, that describe behavior over time and provide the context needed for accurate predictions.

The other options are incorrect for the following reasons:

Transactional data is a classic example of structured data (e.g., tables with columns for timestamp, amount, merchant ID) or semi-structured data (e.g., JSON logs), not unstructured data like free-form text or images.
While the customer base might not perfectly represent the entire population, 'self-selection bias' is a term more precisely applied to surveys or studies where subjects voluntarily opt-in to participate. It is not considered the primary data preparation challenge for this type of generated data compared to the need for feature engineering.
Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) are hardware accelerators designed primarily for the large-scale matrix and tensor computations involved in training complex machine learning models, not for the general data ingestion, parsing, and aggregation tasks of stream pre-processing.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.

What is feature engineering in machine learning?

Open an interactive chat with Bash

What are time-based windows in data processing?

Open an interactive chat with Bash

Why is raw transactional data insufficient for predictive modeling?

Open an interactive chat with Bash

CompTIA DataX DY0-001 (V1)

Operations and Processes

Your Score:

SAVE $64

CompTIA DataX Voucher

v1 / DY0-001

$529.00 $465.00

Bash, the Crucial Exams Chat Bot

AI Bot

CompTIA DataX DY0-001 (V1) Practice Question

Answer Description

Ask Bash

What is feature engineering in machine learning?

What are time-based windows in data processing?

Why is raw transactional data insufficient for predictive modeling?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams