CompTIA DataX DY0-001 (V1) Practice Question

A data science team is developing a pipeline to digitize a large archive of historical financial ledgers from the early 20th century. The documents are scanned in grayscale and suffer from significant issues, including page yellowing, ink bleed-through, and non-standard, multi-column layouts. An initial implementation using a standard Tesseract OCR configuration yields a very high Word Error Rate (WER). To achieve the most significant improvement in extraction accuracy, which of the following computer vision techniques should the team prioritize?

  • Use a super-resolution generative adversarial network (SRGAN) to increase the effective DPI of the source images before OCR processing.

  • Fine-tune a pre-trained Transformer-based language model on a modern financial text corpus to post-process the OCR output and correct recognition errors.

  • Implement adaptive thresholding for binarization, followed by document layout analysis to segment text blocks and columns before passing them to the OCR engine.

  • Apply data augmentation techniques such as random rotation and scaling to the training dataset of the core OCR recognition model.

CompTIA DataX DY0-001 (V1)
Specialized Applications of Data Science
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot