CompTIA DataX DY0-001 (V1) Practice Question

An e-commerce company is designing a real-time bidding agent that must output a continuous bid price (0 - $5) for every advertising impression. The state representation contains hundreds of contextual features, and millions of interactions can be logged each day, so the team plans to store experience in a replay buffer and train off-policy. They also want an update rule whose gradient estimates have lower variance than pure Monte-Carlo policy-gradient methods. Which reinforcement-learning algorithm is the most appropriate starting point for these requirements?

  • Upper Confidence Bound (UCB1) multi-armed bandit strategy

  • Deep Deterministic Policy Gradient (actor-critic with experience replay)

  • On-policy SARSA(λ) with eligibility traces

  • Tabular Q-learning with ε-greedy exploration

CompTIA DataX DY0-001 (V1)
Specialized Applications of Data Science
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

SAVE $64
$529.00 $465.00
Bash, the Crucial Exams Chat Bot
AI Bot