CompTIA DataX DY0-001 (V1) Practice Question

You are tasked with building word embeddings for a biomedical text-mining system. The corpus contains many domain-specific compound words that appear only once (for example, "interferon-beta-1a"), yet researchers still want to query the vectors with arithmetic analogies such as "ribosome − protein + RNA ≈ ?". Which embedding approach most directly meets the dual requirement of (1) assigning informative vectors to these low-frequency or unseen tokens and (2) preserving the linear relationships exploited by analogy tasks, without fine-tuning a large language model?

  • Factorize a global word-word co-occurrence matrix with GloVe to obtain dense vectors.

  • Train a skip-gram Word2vec model with negative sampling on word tokens only.

  • Use fastText to learn subword-level skip-gram embeddings that compose each word vector from its character n-grams.

  • Create one-hot vectors for every word in the corpus and apply principal component analysis to reduce their dimensionality.

CompTIA DataX DY0-001 (V1)
Specialized Applications of Data Science
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot