Visit here for our full Microsoft DP-100 exam dumps and practice test questions.
Question 21:
You are training a large Transformer-based NLP model on Azure Machine Learning using a GPU compute cluster. During training, you observe that GPU memory usage is high and training frequently crashes with out-of-memory errors. You want to maintain model accuracy while reducing memory usage. Which solution is the most effective and aligned with Azure ML best practices?
A) Reduce the GPU cluster size to force slower but stable training
B) Enable mixed-precision training using frameworks such as PyTorch AMP or TensorFlow FP16
C) Disable gradient accumulation to reduce memory overhead
D) Lower the input sequence length during both training and inference
Answer:
B
Explanation:
Training large NLP models is a major focus area in machine learning workloads, and Azure Machine Learning supports many optimization techniques for handling memory-intensive deep learning operations. Out-of-memory errors are especially common when working with large sequence lengths, deep Transformer layers, multi-GPU distributed training, or enormous batch sizes. The DP-100 exam often includes questions about optimizing memory efficiency, especially for GPU-based workloads on Azure ML compute clusters.
Option B is correct because mixed-precision training, particularly using frameworks like PyTorch Automatic Mixed Precision (AMP) or TensorFlow FP16, significantly reduces GPU memory consumption without sacrificing accuracy. Mixed-precision training uses half-precision floating-point numbers for most operations while retaining full precision for critical calculations such as loss scaling. This approach allows larger batch sizes, enables deeper models, and reduces memory footprint dramatically. Azure ML GPU clusters (such as V100, A100, or NCasT4v3 series) include tensor cores optimized for FP16 computation. Leveraging these cores leads to faster operations, improved throughput, and fewer memory constraints. Mixed-precision training is widely adopted for large Transformer models and is considered a best practice in modern deep learning pipelines.
Option A is incorrect because reducing GPU cluster size does nothing to mitigate memory usage per GPU. If the model is already too large for a given GPU, decreasing compute resources only slows the training process and provides no memory relief. Azure ML encourages scaling out rather than scaling down in memory-bound situations.
Option C is incorrect because disabling gradient accumulation increases peak memory usage, making out-of-memory issues even worse. Gradient accumulation allows training with effectively larger batch sizes by splitting each batch across several smaller forward/backward passes. Removing it reduces flexibility and increases memory pressure.
Option D, lowering input sequence length, can technically reduce memory usage but often significantly impacts model performance and accuracy in NLP tasks. For tasks where sequence information is critical, such as summarization or long-document classification, reducing sequence length may cause major performance degradation. This choice sacrifices model quality, which contradicts the requirement of maintaining accuracy.
Therefore, option B is the best solution because mixed-precision training is specifically designed to optimize GPU memory usage while maintaining model performance. The DP-100 exam stresses leveraging frameworks and hardware-accelerated compute optimizations rather than degrading model architectures or reducing workload complexity. Mixed-precision allows practitioners to train large models efficiently, exploit Azure ML GPU hardware capabilities, and avoid out-of-memory interruptions. This approach aligns perfectly with Azure ML deep learning best practices and modern scalable model development workflows.
Question 22:
A machine learning team stores raw training data in Azure Data Lake Storage Gen2 and wants to build a reproducible Azure ML pipeline that performs preprocessing, feature engineering, model training, and evaluation. They want every pipeline execution to be fully traceable, including code snapshots, parameter settings, dataset versions, and model outputs. Which Azure ML capability best ensures full reproducibility and traceability across the entire workflow?
A) Logging metrics manually in text files
B) Using Azure ML Experiments and pipeline step versioning
C) Saving only the final model in the Model Registry
D) Running training scripts on local compute and uploading output artifacts manually
Answer:
B
Explanation:
Reproducibility is a central requirement in machine learning operations, and Azure Machine Learning provides a comprehensive system for tracking executions, code snapshots, metrics, datasets, environments, and models. The DP-100 exam reinforces the importance of reproducible pipelines, emphasizing features such as Experiment tracking, Run history, Dataset versioning, Environment registration, Pipeline versioning, and Model Registry integration.
Option B is correct because Azure ML Experiments and pipeline step versioning work together to provide complete reproducibility. Each pipeline step execution generates a Run object, which captures parameters, code snapshots, environment definitions, dataset versions, execution logs, metrics, artifacts, and outputs. Azure ML automatically stores this information so that each pipeline run is fully traceable. Pipeline versioning ensures that each stage—preprocessing, feature engineering, model training, and evaluation—can be reproduced by referencing the exact configuration used during the original run. When Datasets are versioned in Azure ML, every execution references a specific snapshot of raw or transformed data, preventing accidental changes. Using Experiments, data scientists can view run histories, compare runs, identify performance differences, and maintain compliance. These features combine to form the core reproducibility framework evaluated in the DP-100 exam.
Option A is incorrect because logging metrics manually in text files cannot guarantee reproducibility, lacks centralized visibility, and does not integrate with pipeline metadata or versioning systems.
Option C is insufficient because saving only the model ignores datasets, parameters, environments, and transformation logic. Reproducibility requires the full lineage, not just final artifacts.
Option D is incorrect because running scripts locally breaks controlled environment tracking. Manual artifact uploading introduces inconsistencies, missing metadata, and non-reproducible results. Azure ML discourages using local compute for critical production workflows.
Therefore, option B is the best solution because Experiments and pipeline step versioning provide a structured, automated, and secure way to maintain full traceability and reproducibility across the entire machine learning lifecycle, from raw data to deployed models.
Question 23:
A team is preparing a large dataset of 50 million records for training. They want to use Azure ML to build a pipeline step that preprocesses data in parallel across many nodes. The processing script should run unchanged while Azure ML manages data partitioning and distributed execution. Which Azure ML feature is designed specifically for this large-scale parallel data processing scenario?
A) ParallelRunStep
B) AutoML training jobs
C) HyperDrive with random sampling
D) ScriptRunConfig on a single GPU VM
Answer:
A
Explanation:
ParallelRunStep is one of the most important scalable compute features covered in the DP-100 exam. It is specifically designed for parallelizing large data processing tasks across multiple nodes in a compute cluster. Large datasets such as 50 million records require distributed execution for timely processing. Azure ML handles data partitioning, job assignment, worker orchestration, retries, and output aggregation automatically when ParallelRunStep is used. It allows data scientists to keep the processing script unchanged, simplifying maintenance and reducing error risk.
Option A is correct because ParallelRunStep automatically splits the input dataset into mini-batches and assigns these batches to multiple nodes for parallel processing. It works seamlessly with TabularDatasets, FileDatasets, and data in blob storage. The processing script does not need to be aware of the underlying distribution logic, making it ideal for large-scale operations. The step is also designed for batch inference, preprocessing, feature extraction, and massive parallel computation. Azure ML computes results across nodes, retries failed partitions automatically, and merges results reliably.
Option B is incorrect because AutoML is used for model training and experiment automation, not large-scale preprocessing or distributed ETL workloads.
Option C is incorrect because HyperDrive performs hyperparameter tuning, which is parallel but operates on different parameter combinations—not data partitions. It does not distribute dataset processing.
Option D is insufficient because running a ScriptRunConfig on a single VM cannot handle 50 million records efficiently. It does not distribute workloads and will be significantly slower.
Thus, option A is the correct answer because ParallelRunStep is designed precisely for scalable, distributed data processing pipelines in Azure ML.
Question 24:
You are deploying a machine learning model as a managed online endpoint in Azure Machine Learning. The model depends on a custom tokenizer and several large static assets. You want to minimize endpoint latency and ensure all assets are loaded efficiently. What is the best deployment strategy?
A) Load assets dynamically within the run() function for each request
B) Package assets inside the Docker image or environment and load them in init()
C) Store assets in Azure Blob Storage and download them during every inference
D) Use a compute instance for hosting the model instead of a managed endpoint
Answer:
B
Explanation:
Azure ML managed online endpoints require efficient, low-latency request handling. A major DP-100 topic is understanding the scoring script structure, especially the difference between the init() and run() functions. The init() function executes once when the container starts, which makes it ideal for loading heavy or static assets such as tokenizer files, embedding matrices, large lookup tables, and pre-initialized models. Loading these assets repeatedly inside run() introduces network overhead, I/O delays, and inconsistent latency.
Option B is correct because packaging all required assets into the Docker image or conda environment ensures that the deployment environment is fully self-contained. Loading these assets during init() ensures that they remain in memory throughout the lifetime of the container, eliminating repeated disk or network reads. This approach results in significantly lower latency and more predictable performance. It also improves reliability because assets are available even when network access is limited or blocked. The DP-100 exam emphasizes using init() for loading models and heavy assets, and using custom Docker images for complex dependencies.
Option A is incorrect because loading assets during run() leads to severe performance degradation. Each request triggers extra I/O operations, resulting in high latency and increased compute costs.
Option C is incorrect because downloading assets during inference is extremely slow and introduces failure points. This contradicts best practices for endpoint reliability and performance.
Option D is incorrect because compute instances are intended for development, not production inference. They do not support autoscaling, load balancing, or production SLA guarantees.
Therefore, option B is the correct solution because it ensures optimized inference performance, faster loading times, and robust deployment according to Azure ML best practices.
Question 25:
Your team is conducting a large hyperparameter tuning experiment using Azure ML HyperDrive. The experiment includes hundreds of runs, and model evaluation metrics vary significantly across runs. You want Azure ML to automatically terminate low-performing runs early so compute resources focus only on promising configurations. What HyperDrive feature provides this capability?
A) Fixed sampling strategy
B) Bandit early termination policy
C) Bayesian sampling without termination settings
D) Uniform random sampling with no restrictions
Answer:
B
Explanation:
HyperDrive is one of the most important optimization services in Azure ML and heavily emphasized in the DP-100 exam. Large hyperparameter tuning experiments can quickly accumulate substantial compute cost, especially when runs vary in duration or produce poor results early in training. Early termination policies solve this problem by halting underperforming runs before they consume full compute cycles.
Option B is correct because the Bandit policy monitors a primary metric and terminates runs that fail to reach performance thresholds relative to the best-performing run so far. Bandit evaluates periodically and stops runs that fall behind by a configurable slack factor. This significantly reduces wasted compute time and improves overall search efficiency. The DP-100 exam highlights Bandit as the most flexible and commonly used early termination strategy.
Option A is incorrect because fixed sampling simply controls how HyperDrive selects parameter values. It does not terminate poor runs.
Option C is incomplete because Bayesian sampling alone does not include termination logic. Without early termination, poor configurations will run to completion.
Option D is insufficient because uniform random sampling explores the search space but does not prioritize promising configurations or eliminate poor ones.
Therefore, option B is the best solution because Bandit early termination accelerates tuning, reduces cost, and improves resource utilization, fully aligning with Azure ML best practices and DP-100 exam expectations.
Question 26:
You are building a custom Docker image for Azure Machine Learning training. The image includes several large system packages, GPU dependencies, and Python libraries. Training jobs frequently fail during environment setup because the conda environment is being rebuilt at run time. You want to ensure jobs start quickly and reliably. What is the best practice for packaging dependencies?
A) Include all dependencies in the Docker image and disable conda environment creation
B) Install dependencies dynamically with pip inside the training script
C) Allow Azure ML to auto-install missing packages at job submission
D) Build the environment on a compute instance and copy it manually into the job directory
Answer:
A
Explanation:
Building custom Docker images is a common requirement in enterprise ML workflows, especially when working with complex GPU frameworks, CUDA toolkits, system-level dependencies, or strict versioning requirements. Azure Machine Learning provides full support for custom images, but the DP-100 exam emphasizes that environments must be reproducible, optimized, and stable during execution. Job startup failures frequently occur when environments are built at run time, particularly when conda environments are recreated or large pip packages must be installed each time.
Option A is correct because including all dependencies directly in the Docker image ensures that environment setup is complete before the job starts. This eliminates the need for extensive environment recreation, reduces startup overhead, and minimizes the risk of runtime installation failures. When dependencies are pre-baked into the container, Azure ML simply loads the image onto compute nodes, dramatically accelerating job initialization. This method also improves determinism because every node uses the exact same environment, reducing issues related to inconsistent package versions or missing libraries. For GPU workloads, embedding CUDA versions and compatible deep learning frameworks in the image is often mandatory to avoid mismatch errors. The DP-100 exam stresses the importance of reproducibility and consistent environments, especially when using distributed compute clusters.
Option B is incorrect because installing dependencies dynamically inside the training script is slow and prone to error. It leads to long job initialization times and prevents reproducibility because dependency versions may change over time. In production settings, execution environments should never rely on pip installs during runtime.
Option C is insufficient because auto-installation of missing packages can introduce unpredictable behavior, dependency conflicts, and slowdowns. Azure ML does support automatic environment resolution, but this is recommended only for lightweight experimental workloads, not for production or large distributed training jobs.
Option D is incorrect because manually copying environments from a compute instance bypasses Azure ML’s environment management system. This approach lacks reproducibility and often leads to inconsistencies when jobs run on different compute nodes. Local environments cannot be reliably transferred into distributed compute environments, and Azure ML cannot track or guarantee compatibility.
Thus, option A is the best solution. By packaging all system dependencies, Python packages, and GPU frameworks into the custom Docker image, you ensure that training jobs are reproducible, scalable, and fail-proof. This approach aligns with best practices recommended in the DP-100 exam and ensures fast, consistent training execution across compute nodes.
Question 27:
A data scientist is designing a model evaluation stage in an Azure Machine Learning pipeline. The evaluation step needs access to the model generated in the training step and must store results such as confusion matrices, plots, and evaluation metrics. What is the best way to pass the trained model to the evaluation step?
A) Save the model to a local folder within the training compute node
B) Use PipelineData or OutputFileDatasetConfig to pass the model artifact between steps
C) Re-train the model inside the evaluation step to regenerate the artifact
D) Upload the model manually to Azure Storage and reference it via a static URI
Answer:
B
Explanation:
Passing artifacts correctly between steps in an Azure ML pipeline is a core DP-100 skill. ML pipelines segment the workflow into modular components — such as preprocessing, training, evaluation, and deployment preparation — and each step may produce outputs required by subsequent steps. Artifact management must be automated, reproducible, and tightly integrated with Azure ML’s pipeline execution engine. PipelineData and OutputFileDatasetConfig are specifically designed for transferring intermediate files, models, datasets, and other binary artifacts.
Option B is correct because PipelineData and OutputFileDatasetConfig allow Azure ML to store output artifacts in a managed datastore and make them available to downstream steps. When the training step writes the model file (for example, a pickle file, ONNX model, or TensorFlow SavedModel) to PipelineData, Azure ML automatically persists it and passes a reference to the evaluation step. This maintains lineage, reproducibility, and proper versioning while simplifying artifact management. Downstream steps can directly access the model using standardized paths, and Azure ML ensures that the correct version of the artifact is available during execution. This is the recommended and exam-aligned way to handle intermediate model outputs.
Option A is incorrect because local folders on compute nodes are ephemeral and not guaranteed to persist across pipeline steps. When training completes, the compute node may be released, causing the model to be lost.
Option C is incorrect because re-training the model wastes compute resources, breaks pipeline efficiency principles, and introduces inconsistency if training is nondeterministic.
Option D is insufficient because manually uploading the model to Azure Storage requires manual path management, lacks lineage integration, and breaks automation. Azure ML pipelines are designed to avoid manual artifact management.
Therefore, option B is the correct solution because it leverages Azure ML’s built-in artifact passing mechanism and ensures reproducible, maintainable workflows.
Question 28:
You want to deploy a machine learning model that requires GPU acceleration for inference due to high computational load. The endpoint must autoscale based on traffic and provide low-latency responses. Which Azure Machine Learning deployment option best meets these requirements?
A) Batch endpoint on a CPU cluster
B) Managed online endpoint using GPU-enabled compute
C) Local Docker container running on a developer machine
D) Azure ML pipeline with scheduled triggers
Answer:
B
Explanation:
Real-time inference with GPU acceleration is a common requirement for computationally heavy models such as deep neural networks for image processing, transformer-based NLP models, or real-time recommendation engines. Azure Machine Learning supports two main categories of deployment: managed online endpoints and batch endpoints. The DP-100 exam tests the ability to choose the correct type based on latency, scale, and compute requirements.
Option B is correct because managed online endpoints support GPU-backed compute instances, autoscaling, and low-latency requests. These endpoints are designed specifically for real-time inference workloads where performance, throughput, and dynamic scaling are critical. Using GPU instances enables rapid execution of inference tasks that cannot be efficiently handled by CPUs. Managed online endpoints also allow deploying multiple model versions, routing traffic intelligently, performing rolling updates, and monitoring performance via Application Insights. These capabilities align directly with Azure ML’s real-time deployment best practices.
Option A is incorrect because batch endpoints are intended for asynchronous offline processing, not low-latency real-time inference. They do not support individual, real-time request handling.
Option C is incorrect because local deployment cannot scale, lacks availability guarantees, and is not suitable for production workloads.
Option D is incorrect because pipelines are used for orchestrated workflows, preprocessing, batch processing, and MLOps automation. They are not designed for real-time inference or autoscaling.
Thus, option B is the correct answer and matches Azure ML’s recommended architecture for GPU-enabled, scalable real-time model serving.
Question 29:
You are using Azure ML HyperDrive to optimize hyperparameters for a deep learning model. Your primary metric is validation accuracy. You want HyperDrive to prioritize exploring areas of the parameter space where performance is improving, while still exploring new regions intelligently. Which HyperDrive sampling strategy should you choose?
A) Grid sampling
B) Random sampling
C) Bayesian sampling
D) No sampling strategy
Answer:
C
Explanation:
Hyperparameter optimization strategies determine how HyperDrive searches the parameter space. The DP-100 exam emphasizes choosing the correct strategy based on problem requirements, search efficiency, and model complexity. Bayesian sampling is one of the most advanced and effective search strategies supported by HyperDrive.
Option C is correct because Bayesian sampling uses Bayesian optimization to intelligently navigate the hyperparameter space. It builds a probabilistic model of how hyperparameters relate to the primary metric and selects new configurations that balance exploration (evaluating new areas) and exploitation (refining promising areas). This approach reduces the number of runs needed to find optimal parameters compared to random or grid search. Bayesian sampling is ideal for expensive deep learning training jobs where each run is computationally intensive and must be chosen carefully.
Option A, grid sampling, exhaustively searches a predefined set of values. It is inefficient for high-dimensional or continuous parameter spaces and cannot prioritize promising regions.
Option B, random sampling, explores broadly but does not focus on areas showing improvement. It is more efficient than grid sampling but less intelligent than Bayesian sampling.
Option D is incorrect because HyperDrive requires a sampling strategy to perform tuning.
Therefore, option C is the best choice because Bayesian sampling intelligently optimizes search efficiency and focuses compute resources on the most promising model configurations, fully aligned with DP-100 best practices.
Question 30:
A data scientist wants to organize their Azure Machine Learning workspace so that multiple versions of the same dataset can be tracked, queried, and reused while maintaining full lineage. They also want to ensure that training pipelines always pull the correct dataset version. What Azure ML feature should they use?
A) Azure Blob Storage containers only
B) Azure ML Dataset versioning
C) Local file storage on the compute cluster
D) Inline CSV uploads inside scripts
Answer:
B
Explanation:
Data versioning is one of the most critical components of reproducible machine learning systems. The DP-100 exam heavily emphasizes using Azure ML Datasets to ensure traceability, consistency, and lineage tracking. Azure ML supports both TabularDataset and FileDataset objects, each of which can be versioned to maintain historical snapshots of raw or processed data. When pipelines reference a specific dataset version, Azure ML guarantees that the same data is used in future runs, resolving reproducibility challenges.
Option B is correct because Azure ML Dataset versioning provides a structured way to manage data evolution. When new data arrives or preprocessing changes, data scientists can create a new dataset version while preserving the old one. Azure ML automatically tracks lineage, allowing teams to determine which experiment or model used which dataset version. Versioned datasets integrate seamlessly with pipelines, ensuring that training jobs consistently reference the intended dataset version regardless of updates.
Option A is insufficient because Blob Storage alone does not provide dataset abstraction, version tracking, or lineage metadata. Teams must manage files manually, which is error-prone and not aligned with MLOps best practices.
Option C is incorrect because compute cluster storage is ephemeral and not intended for storing versioned datasets. Data stored locally cannot be shared reliably across runs or users.
Option D is incorrect because inline data uploads inside scripts lack versioning, reproducibility, and governance. This approach is viable only for quick experiments.
Thus, option B is the correct solution because Dataset versioning provides full lineage, reproducibility, and integration with Azure ML pipelines and experiments, matching best practices taught in the DP-100 curriculum.
Question 31:
A data engineering team has created several feature extraction scripts that run on Azure ML compute clusters. These scripts generate large intermediate feature files used by downstream training steps. The team wants to ensure these intermediate files are versioned, stored securely, and automatically available in the training steps that follow. What Azure ML capability best supports this requirement?
A) Use local filesystem storage across compute nodes
B) Use PipelineData or OutputFileDatasetConfig to store and version intermediate artifacts
C) Write intermediate files to a temporary directory and copy manually
D) Upload the files to GitHub for versioning
Answer:
B
Explanation:
In Azure Machine Learning, data lineage and artifact management are essential for building reproducible and automated ML workflows. When building multi-step pipelines, one of the key goals is ensuring that every step’s outputs are captured, stored, versioned, and passed to subsequent steps without requiring manual intervention. Azure ML provides specific tools designed for this exact purpose, central to the DP-100 curriculum.
Option B is correct because PipelineData and OutputFileDatasetConfig are features explicitly created to handle intermediate artifacts in Azure ML pipelines. They allow developers to define structured outputs for pipeline steps and automatically store them in the workspace datastore, where they can be versioned and reused. Once a training step writes its output to PipelineData, the next step can consume that object without requiring knowledge of the underlying storage path. Azure ML handles storage, lineage, access permissions, and version consistency. This aligns perfectly with best practices in managed machine learning environments, ensuring full traceability of feature engineering operations.
Furthermore, PipelineData allows Azure ML to structure intermediate data into logical units recognized by the pipeline engine. This is critical for reproducibility because Azure ML automatically records which versions of each dataset were used to produce downstream artifacts. If the same pipeline is re-executed, Azure ML can reuse cached versions instead of recomputing, depending on step reuse settings. Thus, the team benefits both from traceability and pipeline efficiency.
Option A is incorrect because local filesystem storage on compute clusters is ephemeral. When a job ends or nodes scale down, the data is lost. This violates the requirement of secure and versioned storage.
Option C is incorrect because manually copying files introduces human error risk, breaks automation, and results in inconsistent versioning.
Option D is entirely inappropriate because GitHub is not a suitable location for large binary feature files and provides no integration with Azure ML pipeline data lineage. It also violates best practices for storing large artifacts.
Therefore, option B is the correct choice because it directly supports data lineage, reproducibility, artifact versioning, security, and automated pipeline execution, all of which are emphasized heavily in the DP-100 exam.
Question 32:
A data scientist is training a model using Azure ML and wants to log not only scalar metrics but also images such as confusion matrices, ROC curves, and distribution plots. They want these visual artifacts to be stored per run and accessible in Azure ML Studio. Which method should they use?
A) Save all images locally on the compute node and view them manually
B) Use run.log_image() to upload visual artifacts as part of the experiment
C) Store images in Azure Blob Storage manually and track the URI in a text file
D) Upload plots to GitHub Pages for visualization
Answer:
B
Explanation:
Visualization of model performance is a critical component of machine learning experimentation. Azure Machine Learning provides comprehensive logging APIs that allow developers to store metrics, images, files, and artifacts associated with a training run. The DP-100 exam stresses the use of run.log(), run.log_list(), and run.log_image() for complete experiment tracking.
Option B is correct because run.log_image() is specifically designed to capture plots and images and store them as artifacts in the Azure ML Run history. Whether the data scientist generates a confusion matrix using matplotlib, a feature distribution plot, or an ROC curve, run.log_image() will automatically upload the image to Azure ML, where it becomes part of the run’s artifact store. These images are visible in Azure ML Studio under the Outputs + Logs section and enable side-by-side run comparisons. They are also preserved over time, enabling full traceability.
Option A is incorrect because storing images locally on compute nodes does not persist them. Compute clusters use ephemeral storage, meaning images will be lost after the job completes.
Option C is insufficient because manual upload to Blob Storage breaks the integrated tracking model. Azure ML cannot automatically associate stored URIs with particular runs, making comparison and lineage difficult.
Option D is impractical because GitHub Pages is not meant for storing ML artifacts and does not integrate with Azure ML tracking.
Thus, option B is the correct solution because run.log_image() is the Azure ML-native approach for uploading and tracking visual experiment outputs, aligning fully with DP-100 best practices.
Question 33:
Your Azure Machine Learning training job uses a TabularDataset created from CSV files stored in Azure Blob Storage. Training takes significantly longer than expected because data loading is slow. You want to optimize reading performance while preserving dataset structure. What is the recommended solution?
A) Convert the dataset to Parquet format and recreate the TabularDataset
B) Copy CSV files to the compute node’s local file system before training
C) Enable parallel loading by increasing the number of workers only
D) Switch to using JSON files instead of CSV files
Answer:
A
Explanation:
Data loading can easily become a bottleneck during training, especially when using large datasets stored in remote storage. Azure ML’s TabularDataset supports multiple file formats, and choosing the right one has a significant impact on performance. The DP-100 exam pays particular attention to the efficiency advantages of columnar storage formats such as Parquet.
Option A is correct because Parquet is a columnar, compressed, binary format designed for analytical workloads. When TabularDataset reads Parquet files, it benefits from optimized metadata footprint, efficient I/O, faster column filtering, and reduced storage overhead. Parquet is massively more efficient than CSV for distributed training, as CSV parsing is computationally expensive. Reading from Parquet significantly speeds up data loading, especially when combined with scalable compute clusters. This improvement directly translates to faster end-to-end training times. Converting CSVs to Parquet is a widely recognized best practice in Azure ML and big data environments.
Option B is partially helpful but incomplete. Copying data to the compute node’s local SSD does reduce network overhead, but the CSV parsing cost remains high. The fundamental issue is format inefficiency, not simply data location.
Option C is insufficient because increasing the number of workers only increases parallel parsing overhead without addressing the inherent inefficiency of CSV.
Option D is incorrect because JSON is typically even slower than CSV for large-scale tabular data and introduces unnecessary complexity.
Thus, option A is the best solution because Parquet format fundamentally optimizes I/O performance, supports efficient analytical workloads, reduces parsing cost, and integrates directly with Azure ML TabularDatasets.
Question 34:
A company uses Azure ML to train models for fraud detection. The training pipeline includes steps for data preprocessing, feature engineering, model training, and model evaluation. They want to schedule the entire pipeline to run weekly with new data. Which Azure service should they use to automate weekly pipeline execution?
A) Azure Event Grid
B) Azure Logic Apps or Azure Data Factory scheduling
C) Azure Kubernetes Service
D) Azure Monitor alerts
Answer:
B
Explanation:
Automating recurring ML workflows is a core part of MLOps, and Azure ML pipelines integrate seamlessly with Azure Logic Apps and Azure Data Factory. The DP-100 exam expects candidates to know how to orchestrate scheduled pipeline runs for data ingestion, retraining, evaluation, and model registration.
Option B is correct because Azure Logic Apps and Azure Data Factory both support time-based scheduling. They can trigger Azure ML pipeline endpoints on weekly intervals, providing full MLOps automation without requiring manual intervention. Logic Apps offers flexible orchestration patterns and integrates with numerous Azure services. Data Factory also supports time-triggered pipelines and integrates well with data engineering workflows. Both are recognized Azure ML best practices for scheduled automation.
Option A is incorrect because Event Grid triggers are event-based, not schedule-based. Event Grid is ideal for triggering retraining when new data arrives, not for weekly schedules.
Option C is incorrect because Azure Kubernetes Service is a compute platform, not a scheduler.
Option D is incorrect because Azure Monitor alerts are for reacting to metrics and logs, not for time-based execution scheduling.
Thus, option B is the best answer for scheduling weekly automated ML pipeline execution.
Question 35:
You are optimizing a distributed training job using multiple GPUs per node on an Azure ML compute cluster. You want to ensure each GPU runs a dedicated worker process. Which configuration should you use to control the number of processes per node?
A) Set node_count equal to the number of GPUs
B) Use DistributedRunConfig with process_count_per_node equal to the GPU count
C) Set max_run_duration_seconds to limit worker lifespan
D) Increase the VM size to ensure more CPU resources per worker
Answer:
B
Explanation:
The DP-100 exam includes distributed training configuration topics, particularly around matching worker processes to GPU resources. In multi-GPU training, each GPU must be assigned a dedicated worker process to ensure optimal performance. Azure ML handles distributed execution through DistributedRunConfig.
Option B is correct because process_count_per_node defines exactly how many worker processes run on each node. Setting process_count_per_node equal to the number of GPUs ensures that each GPU receives its own worker. This is essential for frameworks like PyTorch DistributedDataParallel or TensorFlow’s MirroredStrategy. It also ensures efficient use of GPU memory and reduces resource contention.
Option A is incorrect because node_count determines the number of nodes, not processes per node.
Option C is unrelated to resource assignment and does not control GPU allocation.
Option D does not guarantee GPU distribution; CPU resources do not dictate GPU worker mapping.
Thus, option B is the correct configuration for multi-GPU distributed training in Azure ML.
Question 36:
Your team deploys a machine learning model to a managed online endpoint. Over time, the model’s performance degrades due to data drift. You want to detect drift automatically and trigger alerts or retraining when drift is detected. Which Azure ML capability is designed for this purpose?
A) Azure Monitor CPU alerts
B) Azure ML Data Drift Monitor in Datasets
C) Manual comparison of CSV files
D) Offline scoring using Jupyter notebooks
Answer:
B
Explanation:
Data drift monitoring is one of the most important MLOps concepts highlighted in the DP-100 exam. Azure ML Dataset monitors allow teams to track statistical drift between reference data (such as training data) and production data. Drift detection helps teams determine when retraining is required.
Option B is correct because the Data Drift Monitor computes summary statistics, distributional metrics, and divergence scores automatically. It can trigger alerts through Azure Monitor or Logic Apps and provides dashboards for visualizing drift trends. It integrates directly with registered datasets.
Option A is irrelevant because CPU alerts have nothing to do with model performance or drift.
Option C is inadequate and manual, lacking automation or statistical comparison tools.
Option D is not scalable and provides no automated monitoring.
Thus, option B best addresses automatic drift detection.
Question 37:
A data scientist wants to improve training throughput for a PyTorch model on Azure ML. They notice that data loading is the bottleneck due to slow augmentations. Which strategy best improves throughput while preserving augmentation quality?
A) Move augmentations to the GPU using libraries such as NVIDIA DALI
B) Disable augmentations entirely
C) Decrease the number of DataLoader workers
D) Reduce GPU count
Answer:
A
Explanation:
In deep learning, especially computer vision, augmentations are often CPU-bound. When using Azure ML GPU clusters, the CPU can become a bottleneck, starving GPUs. The DP-100 exam covers performance optimization through hardware-accelerated preprocessing.
Option A is correct because moving augmentations to the GPU using libraries like NVIDIA DALI significantly accelerates preprocessing pipelines. GPUs perform image transformations quickly in parallel, feeding augmented batches faster to the training loop. This results in higher GPU utilization, reduced training time, and efficient scaling.
Option B harms model quality because augmentations improve generalization.
Option C makes the bottleneck worse by reducing worker threads.
Option D unnecessarily reduces compute power.
Thus, option A is the most effective strategy.
Question 38:
A machine learning team needs to deploy a model requiring multiple steps: input validation, preprocessing, and final inference. They want to deploy it as a single endpoint. What is the best deployment approach?
A) Chain multiple endpoints together
B) Package all steps inside a single scoring script and environment
C) Run preprocessing externally before calling the endpoint
D) Deploy each step as a separate service
Answer:
B
Explanation:
Azure ML managed endpoints use a scoring script with init() and run(). The DP-100 exam highlights packaging full inference pipelines into a single endpoint to simplify production use.
Option B is correct because packaging preprocessing, validation, and model inference inside one scoring script ensures consistent transformations, simplifies API usage, and improves reliability. This is standard best practice.
Option A is inefficient and increases latency.
Option C offloads responsibility to clients, reducing reliability.
Option D complicates orchestration, monitoring, and scaling.
Thus, option B is correct.
Question 39:
A team wants to run a training experiment that uses spot (low-priority) VMs to reduce cost. What is the main risk associated with using spot nodes in Azure ML?
A) Spot nodes run slower
B) Spot nodes can be preempted at any time
C) Spot nodes cannot run GPU workloads
D) Spot nodes cannot install Python dependencies
Answer:
B
Explanation:
Spot/low-priority VMs are cost-effective but come with the risk of preemption. Azure ML supports spot nodes, and the DP-100 exam covers their trade-offs.
Option B is correct because spot nodes may be evicted when capacity is needed. Jobs must handle interruptions.
Option A is false; spot nodes have identical performance to regular VMs.
Option C is incorrect; Azure supports GPU spot nodes.
Option D is incorrect; they install dependencies normally.
Thus, option B is correct.
Question 40:
A data scientist needs to ensure that training results are reproducible even when using autoscaling compute clusters. Which Azure ML feature best ensures environment consistency across all compute nodes?
A) Local pip installs
B) Registered Azure ML Environments
C) Manual package management on each node
D) Installing dependencies inside the script
Answer:
B
Explanation:
Reproducibility requires consistent environments. Azure ML Environments ensure every node uses identical packages, versions, and dependencies. The DP-100 exam emphasizes registering environments for deterministic execution.
Option B is correct because registered environments guarantee that Azure ML provisions identical containers or conda environments across all nodes. This ensures runs are reproducible on autoscaling clusters.
Other options introduce inconsistency and error.
Thus, option B is the correct answer.