Microsoft DP-900 Azure Data Fundamentals Exam Dumps and Practice Test Questions Set 6 101-120

Visit here for our full Microsoft DP-900 exam dumps and practice test questions.

Question 101:

A company is designing a modern analytics platform using Azure services. They want to follow the medallion architecture, where raw data is stored in a bronze layer, cleaned and structured data is stored in silver, and aggregated analytical data is stored in gold. They need a technology that supports ACID transactions, schema evolution, time travel, and integrates well with Azure Data Lake Storage and Azure Databricks. Which technology should they use to implement their medallion layers?

Answer:

A) CSV files in Azure Data Lake
B) Delta Lake
C) JSON files in Azure Data Lake
D) Azure SQL Database

Answer: B

Explanation:

Delta Lake is the best choice for implementing the medallion architecture because it supports ACID transactions, schema evolution, time travel, and scalable metadata handling while running on top of Azure Data Lake Storage. These core features make Delta Lake uniquely suited for modern data engineering practices. The medallion architecture depends on reliable file-based transactional operations in the bronze, silver, and gold layers. CSV or JSON files cannot guarantee transactional consistency or schema enforcement, which leads to potential corruption of downstream layers if files are partially written or incorrectly formatted.

Option A, CSV files, is a common storage format but not suitable for complex enterprise-grade analytics. CSV files offer no schema enforcement, no ACID transactions, and no versioning, making them vulnerable to erroneous writes or inconsistent file schemas that break pipelines. CSVs are also inefficient for large-scale queries due to lack of compression and inefficient parsing.

Option C, JSON files, supports semi-structured data but suffers from similar limitations as CSV. JSON files lack the transactional and schema governance capabilities required for building multi-layered analytical environments. Storing bronze, silver, and gold layers in JSON without a transactional layer would lead to brittleness and unpredictability in pipelines.

Option D, Azure SQL Database, is a strong relational database but is not intended for lakehouse file-based medallion architecture. SQL Database is designed for OLTP workloads and cannot manage massive distributed data storage or integrate with Spark for large-scale transformations. Additionally, SQL Database does not store data in parquet format or support lakehouse operations across hierarchical layers.

Delta Lake, however, was designed to address the limitations of early data lakes. It uses parquet files under the hood but adds a transactional log (the Delta Log), enabling atomic writes, schema validation, and rollback operations. ACID transactions prevent situations where half-written files corrupt downstream processes. Schema evolution allows transformations to adapt when new fields appear in raw datasets, which is common in real-world ingestion pipelines. Time travel enables users to query previous versions of tables, making debugging and analysis far easier.

Delta Lake integrates seamlessly with Azure Databricks, which provides the compute needed for ETL, machine learning, and SQL analytics. This integration allows data engineers to write reliable pipelines, data analysts to query optimized gold-layer tables, and data scientists to extract features from clean and well-structured silver-layer datasets.

Because the scenario explicitly requires medallion architecture, ACID guarantees, schema evolution, and native compatibility with Azure Databricks, Delta Lake is the only choice that fully satisfies all conditions. Therefore, option B is correct.

Question 102:

A financial institution needs a solution that allows analysts to perform advanced log analysis, time-series processing, anomaly detection, and text search across terabytes of telemetry and security data. They want a specialized analytics engine with fast ingestion and a query language optimized for log exploration. Which Azure service should they choose?

Answer:

A) Azure Synapse Dedicated SQL Pool
B) Azure Stream Analytics
C) Azure Data Explorer
D) Azure SQL Managed Instance

Answer: C

Explanation:

Azure Data Explorer is the correct solution because it is engineered specifically to handle massive-scale log, telemetry, and time-series workloads. It is optimized for extremely fast ingestion, often capable of ingesting billions of records per day, and provides ultra-fast query performance even on very large datasets. The Kusto Query Language (KQL) is designed for exploratory analytics, enabling security analysts, financial data teams, and operational engineers to run complex queries involving text search, pattern matching, time window analysis, and anomaly detection.

Option A, Synapse Dedicated SQL Pool, is intended for structured relational analytics and large-scale SQL-based transformations. While powerful for OLAP workloads, it is not optimized for ingestion or querying of raw log data. It cannot match the latency and indexing strategies that Data Explorer provides for telemetry workloads.

Option B, Stream Analytics, processes real-time data but is not designed for deep historical log exploration or large-scale batch queries across terabytes or petabytes of data. Stream Analytics focuses on streaming windows, not long-term data analysis and search.

Option D, SQL Managed Instance, is a relational database service. Although it can store logs, it is not designed for high-velocity ingestion or time-series analytics. Running text search or telemetry queries on SQL Managed Instance would degrade performance significantly.

Azure Data Explorer uses columnar storage, compression, and advanced indexing techniques. It supports ingesting data from Event Hub, IoT Hub, Log Analytics, or direct pipelines. KQL is ideal for scenarios involving cybersecurity, fraud detection, IoT telemetry, user behavior analytics, and compliance auditing. Its ability to perform complex aggregations, joins, and anomaly detection across billions of events in seconds makes it ideal for modern financial analytics pipelines.

For these reasons, option C is the only service that fulfills all the requirements of the scenario.

Question 103:

A retail organization needs to provide real-time dashboards in Power BI that update every few seconds based on inventory events streaming in from multiple warehouses. They require a service that can process these events in motion and push results directly to a Power BI streaming dataset. Which Azure service supports this real-time analytics requirement?

Answer:

A) Azure Data Factory
B) Azure Stream Analytics
C) Azure Databricks
D) Azure Cosmos DB

Answer: B

Explanation:

Azure Stream Analytics is the right choice for real-time dashboards because it is specifically designed for event stream processing with the capability to push processed results directly into Power BI streaming datasets. It can handle continuous event inflow, apply transformations, perform windowed aggregations, and output results to Power BI for near real-time visualization.

Option A, Data Factory, is not capable of real-time ingestion or continuous processing. It is a batch-oriented ETL and data orchestration service. Using Data Factory for real-time dashboards is impossible because it operates on scheduled triggers rather than continuous streams.

Option C, Azure Databricks, is capable of streaming through Structured Streaming, but it is not purpose-built for direct output to Power BI streaming datasets. While possible through intermediate storage, it does not provide native Power BI sink integration with the simplicity Stream Analytics offers.

Option D, Cosmos DB, can store data with low latency but does not process streaming events or push updates directly into Power BI dashboards.

Stream Analytics supports multiple window functions such as tumbling, hopping, and sliding windows, making it ideal for analyzing real-time events like inventory movements, warehouse activity, IoT sensor readings, and retail operations data. It can read data from IoT Hub, Event Hub, and Kafka-compatible services, apply SQL-like query transformations, and send output directly to Power BI dashboards.

Because the requirement explicitly calls for real-time processing and direct Power BI integration, Azure Stream Analytics is the only option that fulfills all criteria.

Question 104:

A company wants to secure sensitive fields in its Azure SQL Database by preventing unauthorized users from seeing true values. They want these users to see obfuscated versions while administrators still see full values. The company does not want to encrypt the data, only hide it at query time based on user permissions. Which feature should they implement?

Answer:

A) Row-Level Security
B) Dynamic Data Masking
C) Always Encrypted
D) Backup Encryption

Answer: B

Explanation:

Dynamic Data Masking is the correct feature because it allows administrators to control how sensitive data appears to non-privileged users without modifying or encrypting the underlying data. It masks values in query results based on configured rules and user roles. This ensures that unauthorized users cannot view sensitive information but can still work with the data in a limited, masked form.

Row-Level Security, option A, controls access to entire rows, not columns or masked values. It is useful for multi-tenancy or user-based partitioning but does not hide sensitive fields within a row.

Always Encrypted, option C, protects sensitive data both at rest and in transit and prevents even the database engine from reading encrypted fields. However, it requires application changes, client-side encryption, and key management. This scenario explicitly states that encryption is not desired.

Backup Encryption, option D, protects database backups only. It does not hide or alter query-level output.

Dynamic Data Masking offers multiple masking rules including partial masking, random masking, and email masking. This flexibility makes it suitable for financial records, personal data, and regulated fields such as credit card information or government identifiers.

Thus, option B is the correct solution.

Question 105:

An enterprise with multiple departments wants a centralized governance solution that can scan Azure SQL, Azure Data Lake, Synapse, Power BI, and external systems. They want automated metadata extraction, data lineage tracking, a searchable data catalog, and sensitivity classification. Which Azure service fulfills these requirements?

Answer:

A) Azure Key Vault
B) Azure Firewall
C) Microsoft Purview
D) Azure Monitor

Answer: C

Explanation:

Microsoft Purview is the correct answer because it provides a unified data governance solution that can automatically scan, classify, catalog, and track lineage across multiple Azure and non-Azure systems. Organizations with diverse data estates need a centralized catalog to help data engineers, analysts, and compliance officers understand where data resides, how it flows, and what its sensitivity level is.

Azure Key Vault, option A, stores secrets, certificates, and keys, but does not provide cataloging, classification, or lineage tracking.

Azure Firewall, option B, is a network protection service. It does not manage datasets, schemas, metadata, or catalogs.

Azure Monitor, option D, tracks metrics and logs for performance monitoring, not data governance.

Purview provides automated scanning of Azure SQL, Synapse, Data Factory pipelines, Data Lake, Power BI, AWS S3, on-prem SQL Servers, and many other sources. It extracts schema metadata, applies built-in sensitivity classifiers, maps data lineage from ingestion through transformation to consumption, and provides a searchable catalog for business and technical users.

Because the scenario requires broad cataloging, lineage, classification, and integration across many Azure services, Microsoft Purview is the only service that meets all these requirements. Therefore, option C is correct.

Question 106:

A global manufacturing company is building a cloud-based analytics platform that ingests raw IoT sensor data into Azure Data Lake Storage. They want to transform this data using SQL, Python, and Spark, perform machine learning, and schedule collaborative notebooks for engineers and data scientists. The organization also needs optimized clusters, auto-scaling, Delta Lake support, and a unified workspace for all teams. Which Azure service best fits these requirements?

Answer:

A) Azure Synapse Serverless SQL Pool
B) Azure Databricks
C) Azure Stream Analytics
D) Azure SQL Managed Instance

Answer: B

Explanation:

Azure Databricks is the correct answer because it provides a unified analytics workspace for data engineers, data scientists, and analysts to collaborate through notebooks using SQL, Python, Scala, and R. The scenario specifically describes ingestion of IoT sensor data into Azure Data Lake Storage, followed by transformations, machine learning, and collaborative notebooks. Databricks is built exactly for these tasks through its optimized Apache Spark clusters, Delta Lake integration, MLflow support, and workspace-based collaboration model.

Option A, Azure Synapse Serverless SQL Pool, allows ad hoc SQL queries over lake files but does not support Python notebooks, Spark-based machine learning, optimized clusters, or scheduled jobs in the same comprehensive manner that Databricks does. It is useful for analysts but not suitable for a full-scale engineering and ML environment requiring multi-language programming.

Option C, Azure Stream Analytics, supports real-time event processing but cannot provide a collaborative workspace or handle large-scale batch transformations, ML training, or notebook scheduling. Stream Analytics focuses on streaming windows, not lakehouse-style batch and ML workloads.

Option D, Azure SQL Managed Instance, is a relational database engine built for transactional workloads, lift-and-shift migrations, and full SQL Server compatibility. It cannot process raw IoT files from data lakes, cannot run Spark workloads, and provides no collaborative notebook experience.

Azure Databricks is the only option that combines Spark compute, ML capabilities, auto-scaling clusters, and Delta Lake support. Its collaborative workspace allows multiple departments to work together with governed access. Its integration with Delta Lake ensures ACID transactions, schema enforcement, and time travel, making IoT data pipelines reliable and versioned. Additionally, Databricks supports mounting Azure Data Lake to its workspace, enabling efficient read-write operations on massive datasets.

Databricks also provides job scheduling, advanced cluster management, and MLflow for experiment tracking, model packaging, and deployment workflows. The ability to process IoT sensor data at large scale, train ML models such as predictive maintenance or anomaly detection, and publish results into downstream systems makes Databricks the most comprehensive fit for this scenario.

Thus, option B is correct.

Question 107:

A healthcare provider wants to protect extremely sensitive patient data stored in Azure SQL Database. They require encryption in use, in transit, and at rest, and want to ensure that even database administrators cannot read sensitive values. Only approved client applications should be able to decrypt the information using client-side keys. Which Azure SQL feature meets this strict confidentiality requirement?

Answer:

A) Dynamic Data Masking
B) Always Encrypted
C) Row-Level Security
D) Transparent Data Encryption

Answer: B

Explanation:

Always Encrypted is the correct solution because it ensures that sensitive data is encrypted at rest, in transit, and most importantly, in use within the database. The key requirement in the scenario is that even database administrators should not be able to view decrypted values. Always Encrypted achieves this by using client-side encryption keys stored outside the database engine. As a result, the database only ever sees ciphertext and cannot decrypt the values, even during query execution.

Option A, Dynamic Data Masking, only hides sensitive data in query results but does not encrypt data or prevent administrators from viewing actual values. Masking can be bypassed by users with elevated privileges and is not designed for strict confidentiality.

Option C, Row-Level Security, restricts row access based on user filters but does not protect the content of individual fields. It provides data segmentation but does not encrypt or hide values.

Option D, Transparent Data Encryption, protects data at rest by encrypting the physical database files, but the database engine still decrypts values for authorized users. This means administrators with access can still view the plaintext. TDE does not protect data in use.

Always Encrypted uses two types of encryption: deterministic and randomized. Deterministic encryption allows equality comparisons, enabling users to perform lookups on encrypted columns while maintaining confidentiality. Randomized encryption is more secure but does not support equality operations. Both methods ensure that sensitive patient data such as medical history, prescriptions, address information, and personal identifiers remain confidential.

Because client-side drivers encrypt and decrypt data outside the SQL engine, administrators performing maintenance tasks, backups, or tuning cannot read sensitive information. This satisfies strict regulatory requirements such as HIPAA and protects against insider threats.

Thus, option B is correct.

Question 108:

A company wants to deploy an enterprise-wide data catalog that automatically scans Azure SQL Database, Azure Synapse Analytics, Azure Data Lake Storage, on-premises SQL Servers, Power BI workspaces, and third-party cloud platforms. They want governance features including lineage visualization, sensitivity labeling, metadata search, and automated classification. Which Azure service should they implement?

Answer:

A) Azure Automation
B) Azure Monitor
C) Microsoft Purview
D) Azure Key Vault

Answer: C

Explanation:

Microsoft Purview is the correct solution because it provides enterprise-grade data governance capabilities across Azure, on-premises, and external cloud sources. The scenario requires automated scanning, classification, lineage tracking, and cataloging across a diverse data landscape. Purview is designed specifically for these governance tasks.

Option A, Azure Automation, is used for scripting and automating operational tasks, not for data governance, cataloging, or lineage analysis.

Option B, Azure Monitor, tracks metrics, logs, and alerts for Azure resources but cannot classify or catalog datasets.

Option D, Azure Key Vault, stores keys, secrets, and certificates but does not scan or analyze data sources.

Purview integrates with Azure SQL, Synapse, Data Lake Storage, Power BI, SAP, Amazon S3, and more. It extracts metadata and builds a centralized catalog that business users and analysts can search. Lineage tracking shows how data flows from ingestion to transformation to reports. Classification identifies sensitive fields such as financial data or personal information, helping organizations comply with regulations like GDPR or HIPAA.

Purview equips governance officers, architects, and analysts with visibility into data assets across the enterprise, enabling trust and consistency.

Thus, option C is correct.

Question 109:

A transportation analytics team wants to train large-scale machine learning models using distributed compute. They need a platform that supports Spark MLlib, Python, notebooks, and integration with their existing data lake. They also need experiment tracking and cluster scaling. Which Azure service provides all these features?

Answer:

A) Azure Kubernetes Service
B) Azure Databricks
C) Azure Cosmos DB
D) Azure SQL Database

Answer: B

Explanation:

Azure Databricks is the correct answer because it provides a unified analytics environment optimized for distributed machine learning using Spark MLlib and Python. The scenario explicitly mentions distributed training, notebooks, integration with data lakes, experiment tracking, and cluster scaling. Databricks supports all of these features through collaborative notebooks, autoscaling clusters, MLflow integration, and seamless access to Azure Data Lake Storage.

Azure Kubernetes Service, option A, can run ML workloads but requires significant custom configuration and does not provide built-in notebooks, Spark optimization, or MLflow out of the box.

Azure Cosmos DB, option C, is a NoSQL database and does not support distributed compute or ML workloads.

Azure SQL Database, option D, is a relational database service and cannot train machine learning models or run distributed Spark jobs.

Databricks offers optimized Spark runtimes, collaborative workspaces, and scalable compute clusters. MLflow provides experiment tracking, model versioning, packaging, and deployment. These features allow data scientists to manage their ML lifecycle efficiently.

Thus, option B is correct.

Question 110:

A business intelligence team needs to perform ad hoc queries over CSV and JSON files stored in Azure Data Lake Storage without provisioning dedicated compute. They want a pay-per-query model using T-SQL and the ability to join multiple lake-based datasets. Which Azure service fulfills these requirements?

Answer:

A) Azure SQL Managed Instance
B) Azure Stream Analytics
C) Azure Synapse Serverless SQL Pool
D) Azure Database for MySQL

Answer: C

Explanation:

Azure Synapse Serverless SQL Pool (Option C) is the correct answer because it enables users to query data directly from data lakes without requiring them to provision or manage dedicated compute resources. This service is designed for on-demand data exploration, ad hoc querying, and analytics over files stored in Azure Data Lake Storage using familiar T-SQL syntax. It is ideal for scenarios where organizations need quick insights from large volumes of unstructured or semi-structured data such as CSV, Parquet, and JSON. Because it is serverless, you only pay for the amount of data processed by each query, making it cost-efficient for workloads that are intermittent rather than continuous. This flexibility and pay-per-use model differentiate Synapse Serverless SQL Pool from traditional database engines.

Azure SQL Managed Instance (Option A) is a fully managed PaaS solution that offers near 100% compatibility with on-premises SQL Server. It is ideal for migrating SQL Server workloads to the cloud with minimal changes. While it provides powerful relational database capabilities, it is not optimized for on-demand analytics directly over data lake files or for ad hoc querying of large unstructured datasets. It is best suited for OLTP workloads, application backends, and migrations—not for serverless lake-based analytics.

Azure Stream Analytics (Option B) is a real-time event processing and analytics service. It is designed to analyze data in motion from sources like Event Hubs and IoT Hub. While Stream Analytics excels at real-time dashboards, anomaly detection, and streaming transformations, it is not used for querying static datasets in a data lake using SQL.

Azure Database for MySQL (Option D) is a managed relational database service for MySQL workloads. It is ideal for transactional applications built on the MySQL engine but does not support on-demand analytics, data lake querying, or serverless SQL capabilities.

Therefore, Azure Synapse Serverless SQL Pool (Option C) is the correct choice for scenarios involving serverless, on-demand querying of large datasets stored in Azure Data Lake.

Question 111:

A company wants to build a unified analytics architecture allowing SQL analysts, data engineers, and data scientists to transform data, run machine learning models, and manage structured and semi-structured files efficiently. They require a service that supports Spark-based compute, notebooks, automated job scheduling, and Delta Lake ACID transactions. Which Azure service should they choose?

Answer:

A) Azure Synapse Serverless SQL Pool
B) Azure Databricks
C) Azure SQL Managed Instance
D) Azure Data Factory

Answer: B

Explanation:

Azure Databricks is the correct service because it provides a complete collaborative environment built on Apache Spark. It supports SQL, Python, Scala, and R within notebooks, making it ideal for teams composed of analysts, engineers, and data scientists. The scenario requires support for Delta Lake, job scheduling, machine learning workflows, and efficient handling of structured and semi-structured files, all of which Databricks delivers.

Azure Synapse Serverless SQL Pool allows SQL-based queries on data lake files but does not support Spark machine learning, notebooks across multiple languages, or Delta Lake ACID transactions. While useful for exploration, it lacks the broad compute and collaboration features described.

Azure SQL Managed Instance is an operational relational database, not a distributed analytics engine. It cannot process parquet, JSON, or Delta Lake files and is not designed for ML workflows or large-scale distributed transformations.

Azure Data Factory supports ETL orchestration and mapping data flows but does not provide Spark notebooks, machine learning frameworks, Delta Lake optimization, or analytics collaboration needed for the data science environment described.

Databricks integrates natively with Azure Data Lake Storage, allowing efficient ETL and ML workloads on parquet and Delta Lake files. Delta Lake adds schema enforcement, ACID guarantees, time travel, and scalable metadata management, making it ideal for medallion architecture pipelines. Databricks jobs automate ETL and ML tasks, while MLflow enables experiment tracking and model versioning.

Because the scenario demands Spark-based distributed compute, notebook collaboration, ML support, scheduling, and Delta Lake integration, Databricks is the only choice meeting all requirements. Thus, option B is correct.

Question 112:

A financial analytics team needs a system capable of real-time fraud detection. They must process event streams from thousands of transactions per second, perform sliding window aggregations, detect anomalies, and forward alerts to downstream apps. They require a SQL-like query model and sub-second latency. Which Azure service best meets this requirement?

Answer:

A) Azure Data Factory
B) Azure Stream Analytics
C) Azure Event Hubs
D) Azure SQL Database

Answer: B

Explanation:

Azure Event Hubs (Option C) is the correct choice because it is specifically designed to ingest and process large volumes of real-time event data. Event Hubs can capture millions of events per second from applications, IoT devices, sensors, logs, and telemetry sources. It serves as a high-throughput data streaming platform that acts as the entry point for real-time pipelines. Once data is ingested, it can be consumed by various downstream analytics or storage services. Its key features—such as partitioning, consumer groups, and checkpointing—enable reliable, scalable, and low-latency event processing. This makes Event Hubs essential for streaming scenarios like live dashboards, fraud detection, IoT monitoring, and real-time telemetry ingestion.

Azure Data Factory (Option A), on the other hand, is primarily an ETL and data integration service. It is designed for batch data movement, scheduled data pipelines, data transformation, and orchestration across on-premises and cloud systems. Although Data Factory can process data at scale, it is not intended for real-time ingestion or continuous event streams. Instead, it excels in scenarios where data is moved and transformed periodically, such as daily ETL jobs or complex workflow automation.

Azure Stream Analytics (Option B) complements Event Hubs but serves a different purpose. It performs real-time analytics on streaming data, often using Event Hubs or IoT Hub as the ingestion layer. Stream Analytics applies real-time filtering, aggregation, joins, pattern detection, and anomaly detection on streaming inputs using a SQL-like language. However, it cannot function as an ingestion service by itself; it relies on Event Hubs to bring data in.

Azure SQL Database (Option D) is a relational database service designed for transactional workloads and structured data. It is not optimized for high-speed event ingestion or real-time streaming data. Using a relational database for such workloads would lead to performance bottlenecks.

Therefore, considering all options, Azure Event Hubs (C) is the correct service for massive real-time event ingestion.

Question 113:

A company wants to implement the medallion architecture inside its Azure Data Lake. They need a technology that supports schema enforcement, versioning, ACID transactions, scalable metadata management, and provides reliability for bronze, silver, and gold layers. Which technology should they adopt?

Answer:

A) CSV files
B) JSON files
C) Delta Lake
D) Azure SQL Database

Answer: C

Explanation:

Delta Lake is the correct choice because it adds transactional reliability, schema enforcement, version control, and ACID guarantees on top of data lakes. A medallion architecture requires strict governance as data flows from the raw bronze layer to the cleansed silver layer and finally to the aggregated gold layer. Without ACID transactions, corrupted writes or invalid schema changes can break downstream analytics. Delta Lake solves this through a commit log, enabling atomic operations, time travel, and schema evolution.

CSV files lack schema management, ACID transactions, and versioning. They are prone to corruption if writes fail mid-process and cannot support large, structured medallion pipelines.

JSON files support semi-structured data but also lack ACID semantics and schema control. Using JSON for medallion architecture would introduce inconsistent formatting across layers.

Azure SQL Database is strong for relational transactional workloads but cannot serve as a distributed file-based architecture for medallion layers. It is not optimized for lakehouse architectures or multi-terabyte files stored in data lakes.

Delta Lake combines the performance of parquet with the reliability features needed for production analytics. Its support for streaming and batch operations makes it ideal for pipelines that evolve over time. Time travel enables debugging by querying previous table versions. Schema enforcement prevents bad data from entering refined layers.

Therefore, Delta Lake is the only choice that satisfies all medallion architecture requirements. Option C is correct.

Question 114:

A logistics provider needs a globally distributed database that supports multi-master writes, automatic failover, high availability, and tunable consistency levels for handling shipment updates across regions. Which Azure service is designed for these capabilities?

Answer:

A) Azure SQL Database
B) Azure SQL Managed Instance
C) Azure Cosmos DB
D) Azure PostgreSQL Flexible Server

Answer: C

Explanation:

Azure Cosmos DB is the correct service because it provides multi-region distribution, multi-master writes, tunable consistency levels, and automatic failover. For logistics systems that track shipments in real time across multiple continents, these capabilities are essential. Cosmos DB guarantees single-digit millisecond latency at the 99th percentile for reads and writes, allowing global applications to update and access data consistently.

Azure SQL Database offers geo-replication but not multi-master writes or globally distributed consistency tuning.

Azure SQL Managed Instance supports read replicas and high availability but is not engineered for globally distributed systems requiring low latency in many regions simultaneously.

Azure PostgreSQL Flexible Server does not support global multi-master writes.

Cosmos DB provides five consistency models, allowing developers to balance availability and accuracy based on business rules. Its partitioning ensures horizontal scalability to handle massive amounts of operational data across fleets of vehicles. Because shipment events can originate from any region, multi-master writes prevent bottlenecks and improve resilience.

Thus, option C is the correct choice.

Question 115:

A company needs scheduled ETL pipelines that ingest SaaS data, transform it using a visual flow interface, and load results into Azure Synapse Analytics. They need monitoring, alerts, triggers, and integration with on-prem sources through integration runtime. Which Azure service fits this ETL requirement?

Answer:

A) Azure Stream Analytics
B) Azure Data Factory
C) Azure Databricks
D) Azure Monitor

Answer: B

Explanation:

Azure Data Factory is the appropriate service because it provides a fully managed ETL orchestration platform. It supports hundreds of SaaS and database connectors, mapping data flows for visual transformations, scheduling, triggers, monitoring, and on-prem connectivity using integration runtime. Data engineers can design pipelines to ingest data from sources like Salesforce, Dynamics, Google Analytics, SAP, and more.

Azure Stream Analytics is for real-time event processing and cannot orchestrate scheduled ETL pipelines.

Azure Databricks can perform transformations but is not designed for pipeline scheduling, monitoring, or integration runtime functionality. Databricks jobs can complement Data Factory, but they do not replace its orchestration capabilities.

Azure Monitor tracks metrics and logs but does not handle ETL tasks.

ADF’s visual data flows run on Spark clusters behind the scenes, enabling scalable transformations without writing code. With full monitoring, retries, alerts, and event triggers, it provides everything needed for enterprise ETL.

Thus, option B is correct.

Question 116:

A retail corporation needs a massively parallel processing (MPP) data warehouse capable of handling petabytes of structured data, performing complex joins, aggregations, indexing, and power large BI workloads. They also require workload management and distributed compute. Which Azure service should they choose?

Answer:

A) Azure SQL Database
B) Azure Synapse Dedicated SQL Pool
C) Azure Cosmos DB
D) Azure Data Factory

Answer: B

Explanation:

Azure Synapse Dedicated SQL Pool is the correct answer because it is Azure’s enterprise-grade MPP data warehouse solution. It distributes data across compute nodes using a distributed engine, enabling complex analytical workloads on massive datasets. Fact-dimension schemas, high-volume aggregates, and large joins perform efficiently through parallel execution plans.

Azure SQL Database is powerful but designed for OLTP workloads and not distributed MPP analytics.

Azure Cosmos DB is a NoSQL operational database, not a SQL-based analytical warehouse.

Azure Data Factory orchestrates data pipelines and cannot perform the computational work of a data warehouse.

Dedicated SQL Pool supports distribution methods, materialized views, columnstore indexes, and workload isolation. It integrates tightly with Power BI and offers elastic scaling of compute independent of storage.

Thus, option B is correct.

Question 117:

A company needs a pay-per-query SQL engine for ad hoc exploration of files stored in Azure Data Lake Storage. They want to avoid provisioning dedicated compute and instead query parquet, CSV, or JSON directly using T-SQL. Which Azure service meets this requirement?

Answer:

A) Azure Synapse Serverless SQL Pool
B) Azure SQL Managed Instance
C) Azure Databricks
D) Azure Stream Analytics

Answer: A

Explanation:

Azure Synapse Serverless SQL Pool is the correct choice because it allows querying of parquet, CSV, and JSON files stored in data lake using familiar T-SQL without provisioning infrastructure. The service charges per terabyte processed, making it ideal for ad hoc exploration. Analysts can use OPENROWSET and external tables to query unstructured and semi-structured data.

Azure SQL Managed Instance requires data ingestion into tables and cannot query raw lake files.

Azure Databricks offers powerful Spark SQL but requires clusters, not a serverless pay-per-query model.

Azure Stream Analytics is for real-time processing, not ad hoc querying.

Synapse Serverless is a low-cost, flexible model for interactive exploration of data lakes, making option A correct.

Question 118:

A healthcare company must ensure sensitive PHI data is protected inside Azure SQL. They want encryption at rest, in transit, and during query execution. They also want to prevent database administrators from viewing decrypted patient data. Which feature should they implement?

Answer:

A) Transparent Data Encryption
B) Always Encrypted
C) Dynamic Data Masking
D) Row-Level Security

Answer: B

Explanation:

Always Encrypted is the correct answer because it encrypts data at rest, in transit, and in use. Since the encryption and decryption occur on the client side, the SQL engine never sees plaintext values. This prevents DBAs, attackers, or insiders from accessing decrypted PHI data. It provides strong protection aligned with HIPAA compliance.

Transparent Data Encryption only protects data at rest.

Dynamic Data Masking hides data in query results but does not prevent privileged users from viewing actual values.

Row-Level Security restricts row access but does not secure data within a row.

Only Always Encrypted satisfies all security requirements. Thus, option B is correct.

Question 119:

A BI team wants a real-time Power BI dashboard that updates every second using streaming data. They need a service that supports SQL-based streaming queries and direct Power BI output. Which Azure service should they use?

Answer:

A) Azure Stream Analytics
B) Azure Databricks
C) Azure SQL Database
D) Azure Cosmos DB

Answer: A

Explanation:

Azure Stream Analytics is the correct tool because it has a built-in output connector specifically for Power BI streaming datasets. It provides SQL-like streaming queries and real-time windowing functions that allow dashboards to refresh every second. Stream Analytics integrates with Event Hub or IoT Hub and sends results directly to Power BI dashboards.

Databricks can perform streaming but lacks native Power BI streaming output.

SQL Database cannot push real-time updates at sub-second intervals.

Cosmos DB supports operational queries but not streaming analytics or dashboard push capabilities.

Thus, option A is correct.

Question 120:

A security team wants to perform large-scale log analytics across terabytes of security events. They need a specialized engine optimized for time-series queries, full-text search, pattern matching, and fast ingestion. Which Azure service should they select?

Answer:

A) Azure Data Factory
B) Azure Synapse Dedicated SQL Pool
C) Azure Data Explorer
D) Azure SQL Managed Instance

Answer: C

Explanation:

Azure Data Explorer is the correct choice because it is designed specifically for large-scale telemetry, log analytics, and time-series data. It uses columnar storage and advanced indexing to query billions of records in seconds. Its Kusto Query Language is optimized for searching logs, performing pattern detection, running window functions, and detecting anomalies.

Synapse Dedicated SQL Pool is optimized for structured data warehousing, not log analytics.

SQL Managed Instance is not suitable for fast ingestion or time-series workloads.

Data Factory cannot execute analytical queries.

Because the team needs log analytics, time-series analysis, and fast ingestion, Azure Data Explorer is the optimal solution.

Exam

Related posts:

Leave a Reply Cancel reply