Microsoft DP-900 Azure Data Fundamentals Exam Dumps and Practice Test Questions Set 4 61-80

Visit here for our full Microsoft DP-900 exam dumps and practice test questions.

Question 61:

A company stores large volumes of semi-structured data such as JSON and CSV files in Azure Data Lake and wants to load this data into a relational model for analytics. They need a service capable of running scalable distributed SQL transformations directly on the data lake before loading it into a warehouse. Which Azure service should they use?

Answer:

A) Azure SQL Managed Instance
B) Azure Synapse Serverless SQL Pool
C) Azure Synapse Dedicated SQL Pool
D) Azure Data Factory Mapping Data Flows

Answer: D

Explanation:

Azure Data Factory Mapping Data Flows is the correct solution because it provides a visual, scalable, and fully managed ETL engine capable of performing distributed SQL-style transformations on data stored in Azure Data Lake. The scenario describes a company that needs to transform semi-structured data like JSON and CSV files into a relational model. Data Flows run on fully managed Spark clusters behind the scenes, enabling large-scale distributed transformations without requiring the user to write Spark code. This makes it particularly powerful for data engineering teams that prefer a GUI-based transformation approach.

Azure SQL Managed Instance, option A, is a relational database engine not designed to directly transform large quantities of semi-structured files stored in a data lake. While PolyBase-like features exist, MI is not suitable for massive-scale distributed transformations.

Azure Synapse Serverless SQL Pool, option B, can query semi-structured data directly, but it is designed for on-demand querying and not for orchestrated ETL pipelines with many transformation steps. It also lacks the scalable transformation patterns needed for full relational structuring workloads.

Azure Synapse Dedicated SQL Pool, option C, is built for structured analytical workloads and cannot directly transform raw semi-structured files without staging tables and preprocessing. It also incurs compute cost for full provisioning.

Mapping Data Flows allow the user to visually configure joins, filters, expressions, aggregates, pivots, sorts, and schema mapping in a drag-and-drop interface. All logic is executed using distributed Spark runtimes. This helps scale transformations that would be time-consuming or resource-intensive if done in a relational engine.

Additionally, Mapping Data Flows integrate seamlessly with Data Factory pipelines, providing end-to-end ETL orchestration. Users can schedule, trigger, and monitor transformations easily. The service also supports schema drift, which is important when processing semi-structured data whose structure may evolve over time.

Thus, option D is correct.

Question 62:

A business wants to run time-series analytics over log data ingested from multiple IoT devices. They require extremely fast ingestion, the ability to run advanced queries across billions of records, and a specialized engine designed for log analytics and operational telemetry. Which Azure service fulfills this need?

Answer:

A) Azure Stream Analytics
B) Azure Databricks
C) Azure Data Explorer (Kusto)
D) Azure Synapse SQL Serverless

Answer: C

Explanation:

Azure Data Explorer (Kusto) is the correct solution because it is specifically engineered for log analytics, telemetry, time-series analysis, and large-volume ingestion scenarios. The scenario describes IoT log data, which can easily grow to billions of records. Data Explorer supports ultra-fast ingestion, often measured in gigabytes per second, and enables highly optimized queries across massive datasets. Its KQL (Kusto Query Language) is designed for advanced time-series analytics, anomaly detection, trend analysis, window functions, joins, and aggregations.

Azure Stream Analytics, option A, is optimized for real-time processing but not for large-scale historical querying or storing billions of records indefinitely. It excels at data in motion, not deep analytics on stored telemetry.

Azure Databricks, option B, offers powerful Spark processing and can handle large datasets but is less optimized for fast ad-hoc log analytics. It requires provisioning clusters and writing code in Spark SQL or Python, which may introduce additional overhead.

Azure Synapse SQL Serverless, option D, is ideal for lightweight exploration of data in storage but cannot match the ingestion speed, optimizations, or time-series analysis capabilities of Data Explorer.

Azure Data Explorer’s columnar indexing, caching, compression, and optimized engine make it ideal for operational telemetry workloads. It supports near real-time dashboards, anomaly detection queries, and integration with Azure Monitor, IoT Hub, Event Hub, and Log Analytics. It is widely used for analyzing application performance, IoT telemetry, security logs, and high-volume events.

Thus, option C is correct.

Question 63:

A global enterprise requires a relational database platform that provides automatic backups, built-in high availability, point-in-time restore, and compatibility with SQL Server features such as SQL Agent, linked servers, and native cross-database queries. Which Azure service meets these requirements?

Answer:

A) Azure SQL Database Single Database
B) Azure SQL Managed Instance
C) Azure Synapse Dedicated SQL
D) Azure Cosmos DB

Answer: B

Explanation:

Azure SQL Managed Instance is the correct choice because it offers near-full compatibility with SQL Server, including SQL Agent, linked servers, cross-database queries, and instance-level features. The scenario emphasizes compatibility with traditional SQL Server functionalities that are not fully available in Azure SQL Database Single Database. Managed Instance also provides automatic backups, built-in high availability, failover groups, and point-in-time restore.

Azure SQL Database Single Database, option A, provides many relational capabilities but lacks full SQL Server compatibility, such as cross-database queries and SQL Agent. It also offers database-level deployment rather than instance-level features.

Azure Synapse Dedicated SQL, option C, is primarily an analytical data warehouse engine and not meant for OLTP workloads or legacy SQL Server compatibility.

Azure Cosmos DB, option D, is NoSQL and not a relational database. It does not support SQL Agent, cross-database queries, or SQL Server compatibility.

Managed Instance provides a seamless bridge for organizations migrating from on-premises SQL Server to Azure. It supports instance-level features, CLR integration, replication, and compatibility for legacy workloads. It is designed for applications requiring minimal changes during migration.

Thus, option B is correct.

Question 64:

A company needs to securely store connection strings, credentials, API keys, and other secrets used by their analytics pipelines. They also want to rotate secrets automatically and grant access based on Azure AD identities. Which Azure service should they use?

Answer:

A) Azure Key Vault
B) Azure Storage Account
C) Azure Monitor
D) Azure Sentinel

Answer: A

Explanation:

Azure Key Vault is the correct service because it securely stores secrets, keys, and certificates for applications and analytics pipelines. The scenario requires secure storage of sensitive credentials, automatic rotation, and identity-based access control using Azure AD. Key Vault supports all of these requirements. It provides role-based access, logging, encryption, and seamless integration with Data Factory, Synapse, Databricks, Azure Functions, and other services.

Azure Storage Account, option B, can store files and blobs but is not designed for secure secret management, automatic rotation, or centralized secret governance.

Azure Monitor, option C, provides monitoring and logging but not secret storage.

Azure Sentinel, option D, is a SIEM tool for security analytics, not credential storage.

Key Vault reduces the need for embedding secrets in pipeline configurations or code. It supports managed identities, enabling services to authenticate without storing credentials. It also supports secret versioning, hardware security modules, and audit logs.

Thus, option A is the correct answer.

Question 65:

An analytics team wants to use Power BI to analyze large datasets stored in Azure Synapse Dedicated SQL Pool. They require the highest performance possible for aggregations, semantic models, and compressed in-memory caching. Which Power BI mode should they choose?

Answer:

A) DirectQuery Mode
B) Import Mode
C) Composite Mode
D) Live Connection

Answer: B

Explanation:

Import Mode is the correct choice because it provides the fastest query performance by loading compressed, columnar data into Power BI’s in-memory engine (VertiPaq). The scenario describes a need for high-performance aggregations and semantic modeling over large datasets. Import Mode is optimized for this scenario by pre-loading data, enabling lightning-fast calculations, DAX queries, and dashboard interactions.

DirectQuery Mode, option A, queries the underlying Synapse engine in real time. While useful for very large datasets, it cannot match the performance of in-memory analytics and may introduce latency.

Composite Mode, option C, combines Import and DirectQuery but still does not provide the same consistent high-speed experience as full Import Mode.

Live Connection, option D, is used primarily with Analysis Services, not Synapse Dedicated SQL.

Import Mode uses VertiPaq compression to dramatically reduce memory usage and improve performance. It supports advanced DAX measures, relationships, hierarchies, and complex semantic modeling. For high-performance BI over structured warehouse data, Import Mode is the industry-standard choice.

Thus, option B is correct.

Question 66:

A company wants to allow analysts to explore large volumes of structured and semi-structured data stored in Azure Data Lake using T-SQL without provisioning dedicated compute resources. They want a solution that automatically scales, charges only per query, and supports external tables over parquet and CSV files. Which Azure service should they use?

Answer:

A) Azure SQL Database
B) Azure Synapse Serverless SQL Pool
C) Azure Databricks
D) Azure HDInsight

Answer: B

Explanation:

Azure Synapse Serverless SQL Pool is the correct choice because it delivers on-demand distributed query processing against data stored directly in Azure Data Lake. The company’s requirement is to explore large bodies of structured and semi-structured data using T-SQL without provisioning dedicated compute or managing clusters. Serverless SQL Pool provides a truly serverless architecture, meaning no pre-provisioned compute is required. Users pay only for the amount of data processed by each query, making it ideal for ad-hoc data exploration and cost-efficient analytics.

Azure SQL Database, option A, is a fully managed relational database that cannot directly query raw files in a data lake in a serverless manner. It requires data ingestion into relational tables before analysis.

Azure Databricks, option C, provides Spark-based compute and can query data lake files, but clusters must be provisioned and managed. It is not serverless and does not provide pure T-SQL querying for semi-structured files.

Azure HDInsight, option D, supports big data workloads including hive and spark, but requires cluster provisioning and management, making it unsuitable for pay-per-query serverless analytics.

Synapse Serverless SQL Pool enables creation of external tables referencing data lake files such as parquet, CSV, and JSON. Analysts can run SQL queries directly against these files without loading data. This is extremely useful in scenarios where teams need fast insights without extensive ETL pipelines. Serverless SQL also integrates with Azure Active Directory, supports views, supports analytical queries using SQL constructs, and allows straightforward connectivity from tools like Power BI.

The ability to avoid cluster management while still benefiting from distributed SQL performance makes Synapse Serverless SQL Pool a key part of modern lakehouse architectures. Since the company’s needs include pay-per-query billing, automatic scaling, and SQL-based access to data lake files, Serverless SQL Pool is the best match.

Thus, option B is the correct choice.

Question 67:

An organization needs to capture high-volume, real-time telemetry from mobile applications worldwide. They want to ingest millions of events per second and forward them to multiple downstream consumers for analytics, dashboards, and machine learning. Which Azure service is ideal for building this event ingestion pipeline?

Answer:

A) Azure Event Grid
B) Azure Event Hubs
C) Azure Service Bus
D) Azure Data Factory

Answer: B

Explanation:

Azure Event Hubs is the correct service because it is built specifically for large-scale event ingestion scenarios requiring extremely high throughput and low-latency handling. The organization in the scenario needs to capture real-time telemetry from mobile applications around the world, and Event Hubs can ingest millions of events per second. It also supports partitioning, consumer groups, checkpointing, and integration with numerous analytics platforms including Azure Databricks, Azure Stream Analytics, and Azure Synapse Analytics.

Azure Event Grid, option A, is optimized for reactive event-driven architectures and routing lightweight events. While Event Grid can handle many events, it is not designed for sustained, high-throughput ingestion of telemetry streams at the scale described.

Azure Service Bus, option C, is designed for enterprise messaging, transactional delivery, and command patterns, not massive telemetry ingestion. It is best suited for decoupled microservices or business workflows.

Azure Data Factory, option D, handles data integration and ETL orchestration, but not real-time event ingestion.

Event Hubs is built on a distributed streaming architecture offering performance and availability guarantees for ingesting logs, telemetry, sensor data, clickstreams, and event streams. It follows the Kafka ecosystem patterns, supporting Kafka-compatible APIs, making integration easier for many analytics systems. Consumer groups allow multiple independent readers to consume the same event stream in parallel, enabling real-time dashboards, machine learning models, anomaly detection systems, and archival processes simultaneously.

For global telemetry ingestion, elastic scaling and geo-disaster recovery features allow Event Hubs to support worldwide applications with resilient architectures. Mobile app telemetry is often highly bursty, making Event Hubs an excellent match due to its ability to buffer large spikes in event volume.

Thus, option B is correct.

Question 68:

A financial analytics team needs to perform complex transformations on data stored in parquet files, including joins, aggregations, window functions, and machine learning model training using Python. They want a collaborative notebook interface and autoscaling compute clusters. Which Azure service best meets these requirements?

Answer:

A) Azure Stream Analytics
B) Azure Synapse Dedicated SQL Pool
C) Azure Databricks
D) Azure App Service

Answer: C

Explanation:

Azure Databricks is the correct selection because it provides a fully managed, highly optimized Apache Spark environment with support for collaborative notebooks, distributed data processing, and machine learning using Python, SQL, Scala, and R. The scenario specifies parquet file transformations, complex SQL operations, and the need for machine learning training. Databricks is uniquely positioned to fulfill all these needs.

Azure Stream Analytics, option A, is limited to real-time streaming analytics using SQL-like logic and cannot perform large-scale batch transformations or machine learning tasks.

Azure Synapse Dedicated SQL Pool, option B, supports MPP analytical SQL workloads but cannot run machine learning code or Python-based ML pipelines directly on parquet files.

Azure App Service, option D, is used for hosting web applications and APIs, not big data analytics or machine learning workloads.

Databricks provides autoscaling clusters that optimize resource usage based on workload. It includes Delta Lake technology, improving reliability, concurrency control, ACID transactions, and versioning of parquet data. Notebooks allow team members to collaborate in real time, mixing Python, SQL, and visualizations within the same environment.

For financial analytics teams dealing with complex models, Databricks integrates well with MLflow for model tracking, hyperparameter tuning, and deployment workflows. It also supports large distributed operations such as joins and window functions on massive datasets.

Thus, option C is correct.

Question 69:

A company wants to build a historical analytical dataset for BI reporting. They plan to use star schema design with dimension and fact tables to optimize aggregation performance. Which type of workload and Azure service combination is most appropriate for hosting this schema?

Answer:

A) OLTP workload on Azure SQL Database
B) OLAP workload on Azure Synapse Dedicated SQL Pool
C) Time-series workload on Azure Data Explorer
D) NoSQL workload on Azure Cosmos DB

Answer: B

Explanation:

An OLAP workload hosted on Azure Synapse Dedicated SQL Pool is the best choice because the scenario describes traditional data warehouse modeling using star schema design. Fact and dimension tables are optimized for analytical queries that involve scanning, aggregations, rollups, drilldowns, and BI dashboards. Synapse Dedicated SQL Pool uses Massively Parallel Processing (MPP), which distributes large tables across multiple compute nodes, accelerating analytical performance for BI scenarios.

Option A, OLTP workload on Azure SQL Database, is not designed for star schema analytics. It is optimized for transactional operations with many small reads and writes, not large scans or aggregations typical of BI workloads.

Option C, time-series workload on Azure Data Explorer, excels at log and telemetry analytics but not relational star schemas.

Option D, NoSQL workload on Azure Cosmos DB, is built for operational key-value or document storage, not structured relational analytics.

Dedicated SQL Pool supports indexing strategies, columnstore compression, table distribution methods, and materialized views—all critical for implementing star schemas efficiently. BI tools like Power BI integrate seamlessly with Synapse, providing high-performing dashboards. The warehouse-first design aligns perfectly with enterprise historical analytical workloads.

Thus, option B is correct.

Question 70:

A data governance team needs to track lineage, classify sensitive information, manage data catalogs, and apply data discovery rules across Azure data assets including SQL databases, data lakes, and Power BI datasets. Which Azure service provides this centralized data governance capability?

Answer:

A) Azure Monitor
B) Azure Purview (Microsoft Purview)
C) Azure Sentinel
D) Azure Security Center

Answer: B

Explanation:

Azure Purview, now known as Microsoft Purview, is the correct service because it provides centralized governance, cataloging, lineage tracing, classification, and discovery of data across Azure and external systems. The scenario specifically requires lineage tracking, classification of sensitive information, cataloging, and data discovery—Purview is designed for all these tasks.

Azure Monitor, option A, focuses on system performance monitoring and metrics collection, not data governance.

Azure Sentinel, option C, is a SIEM tool for threat detection and security analytics, not data cataloging or lineage management.

Azure Security Center, option D, helps assess cloud security posture but does not provide data governance or metadata cataloging.

Purview crawls data sources, extracts metadata, organizes datasets into categories, identifies sensitive content using built-in classifiers, and builds lineage maps showing how data moves through pipelines and transformations. This helps organizations maintain compliance, reduce risk, and enable analysts to find trusted datasets for analytics.

Thus, option B is correct.

Question 71:

A company needs to analyze data stored in Azure Data Lake using a service that supports high-performance queries with full-text search, time-series analysis, and anomaly detection. They also need very fast ingestion of logs and telemetry data, often reaching terabytes daily. Which Azure service best meets these requirements?

Answer:

A) Azure Databricks
B) Azure Data Explorer (Kusto)
C) Azure SQL Managed Instance
D) Azure Data Factory

Answer: B

Explanation:

Azure Data Explorer, also known as Kusto, is the correct choice because it is optimized for analyzing large volumes of log, event, telemetry, time-series, and semi-structured data at extremely high speeds. The scenario describes a need to ingest massive amounts of telemetry data at scale, often reaching terabytes per day, and to run advanced analytical queries such as full-text search, anomaly detection, and time-series pattern analysis. Data Explorer is uniquely designed for this type of workload. It supports rapid ingestion capability, often measured in gigabytes per second, which is ideal for high-throughput telemetry and logs from applications, IoT devices, and distributed systems.

Azure Databricks, option A, can process large-scale data and supports machine learning, but it is not optimized for extremely fast ingestion of raw logs or interactive, low-latency querying. Databricks relies on Spark-based batch or structured streaming, which is powerful but not ideal for rapid ad-hoc log queries across billions of records.

Azure SQL Managed Instance, option C, is a relational database platform. Although it supports structured queries, it is not designed to ingest terabytes of logs daily, nor does it support time-series or full-text search at the scale described. It also cannot provide sub-second analytical responses across billions of telemetry entries.

Azure Data Factory, option D, is an orchestration and ETL service used for data movement and transformation. It does not provide interactive analytics capabilities and cannot replace a true analytical engine like Data Explorer.

Azure Data Explorer uses Kusto Query Language (KQL), which is designed for exploratory analytics across semi-structured data. It includes native support for time-series functions, join operations, windowing functions, aggregation, machine learning extensions such as anomaly detection, and pattern recognition. KQL is extremely efficient for scanning billions of rows in seconds due to the engine’s columnar storage, automatic indexing, caching, and compression.

Data Explorer integrates seamlessly with Azure Monitor Logs, Application Insights, IoT Hub, Event Hub, and many telemetry-producing systems. It acts as a specialized analytics engine for data that requires fast insights and rapid querying at scale.

Thus, option B is correct.

Question 72:

A business wants to implement a cost-effective data warehouse architecture where raw data is stored cheaply in Azure Data Lake, and SQL queries are executed on-demand without provisioning compute clusters. They also want BI tools to connect to this layer for ad-hoc analysis. Which architectural approach should they adopt?

Answer:

A) Traditional warehouse using Azure SQL Database
B) Lakehouse architecture using Synapse Serverless SQL
C) NoSQL architecture using Azure Cosmos DB
D) Time-series solution using Azure Data Explorer

Answer: B

Explanation:

A lakehouse architecture using Synapse Serverless SQL is the correct choice because it enables organizations to store raw data cost-effectively in Azure Data Lake while allowing SQL-based analytics on-demand without provisioning dedicated compute resources. The business in the scenario wants to pay only for what they query, avoid maintaining clusters, and support direct BI tool integration. Synapse Serverless SQL perfectly aligns with these needs, offering a serverless distributed query engine that operates directly on files stored in data lake formats like parquet, CSV, and JSON.

Option A, using Azure SQL Database as a traditional warehouse, requires provisioning compute and storage resources upfront. It is not cost-effective for scenarios involving massive raw data stored cheaply in a data lake, nor does it support on-demand SQL across semi-structured files in storage.

Option C, NoSQL architecture using Azure Cosmos DB, is not intended for warehousing or analytical query workloads. Cosmos DB is optimized for operational workloads with low latency across distributed regions.

Option D, time-series solution using Azure Data Explorer, is optimized for logs and telemetry analytics but not for ad-hoc SQL-based BI querying across large data lake files in standard warehouse formats.

The lakehouse model combines the strengths of data lakes and warehouses. Raw and curated data is stored in open file formats such as parquet, allowing low-cost storage and compatibility with many big data processing engines. Synapse Serverless SQL adds an easy-to-use SQL layer capable of querying this data directly.

This architecture supports:

BI tools such as Power BI and Tableau

Schema inference

External table definitions

Views and transformations over data lake files

pay-per-query billing

No cluster provisioning or maintenance

These benefits allow companies to build flexible, scalable, and cost-optimized analytical platforms without the overhead of maintaining a traditional provisioned warehouse environment.

Thus, option B is correct.

Question 73:

An organization wants to ensure that personally identifiable information in their Azure SQL Database—such as names, phone numbers, and email addresses—is not visible to unauthorized users, yet they do not want to modify the underlying data. They need a feature that masks sensitive fields during query results without changing the stored values. Which Azure SQL feature meets this requirement?

Answer:

A) Transparent Data Encryption
B) Dynamic Data Masking
C) Always Encrypted
D) Row-Level Security

Answer: B

Explanation:

Dynamic Data Masking is the correct option because it allows organizations to protect sensitive data by masking it in query results without modifying the actual stored data. The scenario describes a business requirement where unauthorized users should not see real PII values, but the underlying data must remain intact. Dynamic Data Masking applies masking rules at query execution time, displaying only masked versions of the fields depending on user permissions.

Transparent Data Encryption (TDE), option A, encrypts data at rest but does not prevent users from viewing sensitive data if they have database access.

Always Encrypted, option C, encrypts data both in storage and during computation, requiring client-side encryption keys. It is designed for end-to-end data protection but requires application changes and does not simply mask values—it encrypts them, preventing even the database engine from seeing plaintext. This is more complex than what the scenario requires.

Row-Level Security, option D, restricts which rows users can see but does not mask column-level data.

Dynamic Data Masking provides simple masking formats including partial masking, email masking, password masking, and custom string masking. It enables organizations to prevent accidental exposure of sensitive data during development, reporting, or shared analytics environments. It requires minimal configuration and integrates seamlessly into existing SQL systems.

Thus, option B is correct.

Question 74:

A retail company wants to implement a highly scalable data ingestion system that automatically scales based on volume, provides real-time stream processing, and integrates with both Spark and SQL engines for downstream analytics. They also want the system to handle bursts of millions of incoming events during peak sales. Which combination of services should they use?

Answer:

A) Azure SQL Database and Azure Data Factory
B) Azure Event Hubs and Azure Stream Analytics
C) Azure Cosmos DB and Azure App Service
D) Azure Data Lake and Azure HDInsight

Answer: B

Explanation:

Azure Event Hubs combined with Azure Stream Analytics is the correct solution because this pairing provides a scalable event ingestion pipeline capable of handling millions of events per second while offering real-time stream processing. Event Hubs acts as the ingestion layer, designed to manage extremely high throughput from retail transactions, mobile apps, IoT sensors, and clickstreams. It supports elastic scaling, partitioning, and multiple consumer groups.

Azure Stream Analytics adds real-time computation with SQL-like queries, enabling event filtering, aggregation, windowing, anomaly detection, and pattern matching. Stream Analytics can deliver processed results to Azure Synapse, Databricks, Power BI, Azure SQL, and data lakes.

Option A, Azure SQL Database and Data Factory, cannot handle real-time ingestion or streaming analytics. SQL Database is not suitable for large bursts of inbound events, and Data Factory runs in batch mode.

Option C, Cosmos DB and App Service, focuses on operational workloads and web hosting but does not deliver streaming analytics or high-throughput ingestion for millions of events per second.

Option D, Data Lake and HDInsight, can process big data but requires cluster provisioning and lacks the elasticity and real-time features provided by Event Hubs and Stream Analytics.

By combining Event Hubs and Stream Analytics, the retail company gains an end-to-end streaming pipeline capable of supporting real-time dashboards, ML model scoring, anomaly alerts, and data lake archival.

Thus, option B is correct.

Question 75:

A corporation needs to implement centralized governance for its entire analytics ecosystem. This includes cataloging data assets, scanning SQL databases and data lakes for metadata, detecting sensitive information, maintaining lineage between data sources and pipelines, and enabling business users to search for trusted datasets. Which Azure service provides these capabilities?

Answer:

A) Azure Key Vault
B) Microsoft Purview
C) Azure Security Center
D) Azure Automation

Answer: B

Explanation:

Microsoft Purview is the correct service because it provides comprehensive data governance, cataloging, metadata scanning, lineage tracking, and data classification across Azure and non-Azure systems. The scenario requires capabilities such as scanning SQL databases, data lakes, and other sources; discovering sensitive information; maintaining lineage between pipelines; and building a searchable catalog for business users. Purview is designed specifically for these tasks.

Azure Key Vault, option A, stores secrets and certificates but does not manage data governance or metadata.

Azure Security Center, option C, focuses on cloud resource security posture but does not provide data lineage or cataloging.

Azure Automation, option D, automates scripts and operational tasks but has no data governance capabilities.

Purview integrates with Azure Data Factory, Azure Synapse, SQL databases, and Power BI. It builds lineage maps that show how data flows from ingestion to transformation to consumption. It classifies data using built-in and custom rules, helping organizations maintain compliance with regulations such as GDPR, HIPAA, and internal governance policies.

Thus, option B is correct.

Question 76:

A company stores massive amounts of historical customer interaction data in Azure Data Lake as parquet files. Their data science team needs a service that can efficiently run large-scale distributed machine learning training jobs, perform advanced statistical analysis, and support collaborative notebook-based development using Python. They also want the ability to configure autoscaling clusters to optimize cost. Which Azure service should they choose?

Answer:

A) Azure Synapse Serverless SQL
B) Azure Databricks
C) Azure Stream Analytics
D) Azure Cosmos DB

Answer: B

Explanation:

Azure Databricks is the best choice because the scenario describes large-scale machine learning workloads, advanced statistical analysis, and collaborative notebook environments that require distributed compute. Azure Databricks provides a fully managed, optimized Apache Spark platform that supports large-scale data science, machine learning, and ETL operations. It integrates deeply with Azure Data Lake, enabling direct reading and writing of parquet files at massive scale. The presence of autoscaling clusters ensures that compute resources can automatically adjust to the complexity and size of the workload, reducing cost while ensuring performance.

Option A, Azure Synapse Serverless SQL, is an excellent service for on-demand SQL analytics but does not support distributed machine learning or Python-based training jobs. It is designed more for lightweight SQL-based exploration rather than large-scale data science workflows. Additionally, it does not provide autoscaling Spark clusters or notebook environments suited for iterative machine learning work.

Option C, Azure Stream Analytics, is specialized for real-time streaming processes and event-based analytics. It cannot perform large-scale historical data analysis or machine learning training on parquet files. It focuses on SQL-like stream processing rather than heavy data science workloads.

Option D, Azure Cosmos DB, is a globally distributed NoSQL database designed for operational workloads, not large-scale distributed data science or machine learning. It cannot execute distributed ML training or notebook-based workflows. Cosmos DB supports transactional workloads and global distribution but not distributed ML at scale.

Azure Databricks supports collaborative work through shared notebooks, allowing Python, SQL, R, and Scala code to run alongside visualizations and documentation. Databricks also integrates MLflow, enabling end-to-end tracking of experiments, hyperparameters, metrics, and model versions. This is critical for teams that need structured model management, reproducibility, and experiment comparison.

The underlying Spark clusters allow Databricks to execute transformations, joins, aggregations, and machine learning algorithms at distributed scale. Autoscaling dynamically adjusts the number of workers based on workload demand, preventing overspending while ensuring performance. Databricks also offers performance optimizations such as caching, adaptive query execution, and optimized connectors for Azure Data Lake and Blob Storage.

Furthermore, Delta Lake transforms parquet storage into a more reliable and performant format with ACID transactions, schema enforcement, and time travel capabilities. This significantly benefits data scientists working with evolving datasets and feature engineering processes.

Because the scenario explicitly calls for large-scale distributed ML, advanced analytics, and collaborative Python development, Databricks uniquely fits these requirements. Thus, option B is correct.

Question 77:

A financial institution wants to build a real-time fraud detection system that evaluates incoming transactions within milliseconds. They need a system capable of processing live event streams, applying rules, aggregations, and anomaly detection, and sending alerts immediately. Which Azure service is best suited for real-time stream processing in this scenario?

Answer:

A) Azure Stream Analytics
B) Azure Data Factory
C) Azure Synapse Dedicated SQL Pool
D) Azure HDInsight

Answer: A

Explanation:

Azure Stream Analytics is the best service for this scenario because it is specifically designed for real-time ingestion and processing of streaming data. Fraud detection requires evaluating incoming transactions in milliseconds, applying business rules, aggregations, window functions, and anomaly detection algorithms. Stream Analytics supports all these operations using a SQL-like query language that is easy to implement and manage.

Option B, Azure Data Factory, cannot handle real-time requirements because it is a batch-oriented ETL and ELT orchestration tool. It is ideal for scheduled workloads but not millisecond-level event processing.

Option C, Azure Synapse Dedicated SQL Pool, is excellent for analytical workloads but cannot process streams of live transactions in real time. It requires data to be landed first before processing and does not offer sub-second responsiveness.

Option D, Azure HDInsight, provides a cluster-based big data ecosystem but introduces latency due to cluster provisioning and is not designed for ultra-low-latency streaming.

Stream Analytics can ingest data from a variety of sources including Event Hubs, IoT Hub, and Kafka. In a fraud detection scenario, transactions typically arrive in real time from payment terminals, mobile apps, or financial applications. Stream Analytics processes events using tumbling, sliding, or hopping windows, enabling detection of anomalies such as repeated attempts, unusual patterns, or suspicious geographic activity.

Stream Analytics supports real-time scoring using machine learning models exported from Azure Machine Learning or other compatible frameworks. These models can be embedded directly into Stream Analytics queries to evaluate transaction risk scores as the data flows in.

It can output alerts to Azure Functions, SQL databases, Event Hubs, Service Bus, or dashboards like Power BI for real-time monitoring. This ability to trigger automated responses is critical in fraud detection systems where immediate action is needed.

The service is fully managed and auto-scales, ensuring that it can handle fluctuations in transaction volume during peak periods without manual intervention. For financial workloads, reliability and exactly-once event processing are essential, and Stream Analytics meets these requirements through checkpointing and robust state management.

Thus, option A is correct.

Question 78:

A company plans to migrate their on-premises SQL Server databases to Azure but wants to retain full SQL Server compatibility, including SQL Agent, cross-database queries, and instance-level features. They also need minimal code changes and built-in automatic backups. Which Azure service should they choose?

Answer:

A) Azure SQL Database
B) Azure Cosmos DB
C) Azure SQL Managed Instance
D) Azure PostgreSQL

Answer: C

Explanation:

Azure SQL Managed Instance is the most suitable service because it provides near-total compatibility with on-premises SQL Server environments. This includes SQL Agent, cross-database queries, linked servers, instance-level features, server-level collation settings, CLR support, and many other SQL Server components that are not available in Azure SQL Database.

The scenario highlights critical needs:

Retention of SQL Server compatibility

Support for SQL Agent

Cross-database functionality

Minimal code changes

Built-in backups

Managed Instance checks all these boxes, enabling organizations to lift and shift their databases with minimal refactoring.

Option A, Azure SQL Database, supports many SQL Server features but lacks cross-database queries, instance-level capabilities, and SQL Agent. The deployment model is database-scoped rather than instance-scoped, making certain workloads difficult to migrate without redesigning architecture.

Option B, Cosmos DB, is a NoSQL distributed platform unrelated to SQL Server workloads and cannot fulfill relational or compatibility needs.

Option D, Azure PostgreSQL, supports the PostgreSQL relational engine but is not compatible with SQL Server T-SQL, SQL Agent, or its ecosystem.

Azure SQL Managed Instance provides automated backups, built-in HA, point-in-time restore, and integration with Azure AD authentication. For hybrid scenarios, it supports VPN gateways and ExpressRoute, making it easier to integrate with on-premises networks.

Because the organization needs full SQL Server compatibility and minimal migration effort, Azure SQL Managed Instance is the best fit. Thus, option C is correct.

Question 79:

A global e-commerce company needs a multi-region distributed database that provides low latency for users worldwide. They require automatic global replication, multi-master write capability, and the ability to choose consistency levels based on application needs. Which Azure database service is designed for this scenario?

Answer:

A) Azure SQL Database
B) Azure Database for MySQL
C) Azure Cosmos DB
D) Azure Synapse Dedicated SQL Pool

Answer: C

Explanation:

Azure Cosmos DB is the strongest match for this scenario because it is designed for global distribution, low-latency access, and multi-master writes. The company requires multi-region replication, low latency for users worldwide, and customizable consistency levels. Cosmos DB supports automatic global distribution by replicating data across regions in seconds and provides multiple consistency models ranging from strong to eventual.

Option A, Azure SQL Database, provides geo-replication but does not support multi-master writes or flexible consistency policies.

Option B, Azure Database for MySQL, supports replication but not millisecond-level global distribution or multi-region writes.

Option D, Synapse Dedicated SQL Pool, is a data warehouse platform not suited for global transactional application workloads.

Cosmos DB supports five consistency levels:

Strong

Bounded staleness

Session

Consistent prefix

Eventual

This flexibility allows developers to optimize latency and correctness depending on workload requirements. Cosmos DB guarantees less than 10 ms read and write latency at the 99th percentile for globally distributed setups.

For multi-master writes, Cosmos DB uses conflict resolution policies that allow deterministic conflict resolution rules or custom logic. This is essential for global e-commerce systems where updates may originate from various locations simultaneously.

Cosmos DB also supports multiple APIs including SQL, MongoDB, Cassandra, Gremlin, and Table APIs, providing flexibility for different development teams.

Thus, option C is correct.

Question 80:

A company needs to build a batch ETL system that loads data from on-premises SQL Server into Azure Data Lake, performs transformations, and then loads the processed data into Azure Synapse Dedicated SQL Pool for reporting. They want a service that supports triggers, scheduling, data flows, and integration with on-premises data securely. Which Azure service should they choose?

Answer:

A) Azure Data Factory
B) Azure SQL Managed Instance
C) Azure Monitor
D) Azure Event Grid

Answer: A

Explanation:

Azure Data Factory is the correct service because it is designed for large-scale batch ETL workloads, data integration, and orchestration. The scenario describes a classic ETL workflow that includes:

Extracting data from on-premises SQL Server

Loading it into Azure Data Lake

Transforming the data

Moving transformed data into Synapse Dedicated SQL Pool

Data Factory supports all of these tasks while providing secure connectivity to on-premises data sources via self-hosted integration runtime. This allows data to be pulled securely through encrypted channels without exposing internal networks.

Option B, Azure SQL Managed Instance, is a relational database service and cannot orchestrate ETL operations or pipeline workloads.

Option C, Azure Monitor, provides observability and logging services, not ETL pipeline orchestration or data movement.

Option D, Azure Event Grid, routes lightweight events between services but cannot orchestrate multi-step ETL pipelines or schedule workloads.

Azure Data Factory Mapping Data Flows provides a no-code transformation engine running on managed Spark clusters, enabling scalable computation. Pipelines allow scheduling, triggers, loops, conditional logic, and integration with Azure Synapse, Azure Data Lake, Azure SQL, and external systems. Data Factory also supports ingesting data from various sources using connectors for SQL Server, Oracle, SAP, Salesforce, and many others.

It is ideal for nightly batch ETL workloads, enterprise data migration, and data warehouse loading. With its monitoring dashboard, users can track pipeline runs, successes, failures, and retry operations.

Because the company needs scheduling, transformation capabilities, secure on-premises connectivity, and orchestration, Azure Data Factory is the perfect match. Thus, option A is correct.

Exam

Related posts:

Leave a Reply Cancel reply