CompTIA DataX DY0-001 (V1) Practice Question

During feature engineering for a multinomial logistic-regression model that predicts customer churn, a data scientist converts the nominal column "EmployerIndustry" (≈50 unique string values) to numeric form by assigning each category an integer from 0 through 49 with a basic label-encoding step. The single encoded column is then supplied to the model. Cross-validation reveals unstable coefficients and erratic performance.

Which statement best explains why this encoding choice is inappropriate for the selected algorithm?

  • The integer codes impose a false ordinal relationship among industries, leading logistic regression to treat higher codes as having proportionally larger (or smaller) effects on churn probability.

  • Label encoding expands the feature into dozens of sparse columns, and the resulting high dimensionality destabilizes the optimizer.

  • LabelEncoder in scikit-learn supports at most 32 distinct categories; using it with ≈50 levels causes excessive variance in the estimated coefficients.

  • Label encoding internally converts the column to a single binary flag, discarding most of the information needed by the model.

CompTIA DataX DY0-001 (V1)
Modeling, Analysis, and Outcomes
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot