A data engineering team is ingesting two new data sources into a cloud data lake. The first source is a set of monthly customer-sales extracts that the ERP system exports as fixed-size CSV files. The second source comprises the company's firewall log files, which record every network connection as it happens. To minimize compute costs and avoid re-loading duplicate records, which consideration BEST explains why the ingestion process for the firewall logs should be designed as a streaming or incremental load rather than a full-replace load?
The log files store binary image data that must be base-64 decoded, so the entire file has to be processed on every run.
The log files follow a strict relational DDL schema, so a full reload is required each time to preserve referential integrity.
The log files are continuously appended, so the pipeline should ingest only the newly written lines via streaming or incremental processing.
Each log file is limited to exactly 10 MB, making it simplest to delete and reload the whole file whenever it reaches that limit.
Log files are generated continuously and each new event is simply appended to the end of an existing file (or to a new rotation of that file). Because the file keeps growing, re-loading the entire log each time would waste resources and repeatedly process rows that have already been captured. Instead, an incremental or streaming pipeline can read only the newly written lines or tail the log stream, ensuring that each event is ingested once. The other statements are incorrect: logs do not rely on a rigid relational DDL, they are not inherently limited to a 10 MB size, and they are plain text (or lightly structured) rather than binary images that would require base-64 decoding.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a streaming or incremental load in data ingestion?
Open an interactive chat with Bash
Why are firewall logs ideal for streaming or incremental processing?
Open an interactive chat with Bash
What are the limitations of a full-replace load for continuously growing logs?
Open an interactive chat with Bash
CompTIA Data+ DA0-002 (V2)
Data Concepts and Environments
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .