CompTIA DataX DY0-001 (V1) Practice Question

A data science team is tasked with extracting information from thousands of biomedical research papers. They are using a powerful, pre-trained transformer-based Named Entity Recognition (NER) model that was trained on a general news and web text corpus. The model performs poorly, frequently failing to identify or misclassifying domain-specific entities such as protein names, gene sequences, and complex chemical compounds. Which of the following represents the most effective and direct strategy to significantly improve the model's performance on this specialized corpus?

  • Replace the transformer-based architecture with a Conditional Random Field (CRF) model trained from scratch on the specialized biomedical corpus.

  • Fine-tune the pre-trained transformer model using a manually annotated dataset of biomedical research papers.

  • Develop an extensive set of regular expressions and dictionary-based rules to specifically target and extract the biomedical entities.

  • Apply aggressive text normalization techniques, such as stemming and stop word removal, to the biomedical text before processing it with the existing model.

CompTIA DataX DY0-001 (V1)
Specialized Applications of Data Science
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot