Microsoft Azure AI Engineer Associate AI-102 Practice Question

Your team runs a prompt flow in Azure AI Foundry that calls an Azure OpenAI deployment named "chat-prod". The deployment currently uses model gpt-35-turbo-1106 and often reaches its tokens-per-minute quota during peak hours. You also need to move to the newer gpt-35-turbo-0125 model without breaking any running clients. Which approach best meets both requirements of higher scalable throughput and a zero-downtime model update?

  • Edit the existing deployment to change only the model name to gpt-35-turbo-0125 and rely on Azure OpenAI to autoscale tokens-per-minute limits automatically.

  • Delete the current "chat-prod" deployment, recreate it with gpt-35-turbo-0125, and then ask Microsoft Support for a higher quota on the new deployment.

  • Provision another Azure OpenAI resource in a different region, deploy gpt-35-turbo-0125 there, and hard-code your application to send half of the requests to the new endpoint.

  • Create a second deployment in the same Azure OpenAI resource that uses gpt-35-turbo-0125, request the additional tokens-per-minute quota for it, and gradually switch traffic from "chat-prod" to the new deployment before deleting the original one.

Microsoft Azure AI Engineer Associate AI-102
Implement generative AI solutions
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot