Snowflake SnowPro Core Recertification (COF-R02) Exam Dumps and Practice Test Questions Set6 Q101-120

Visit here for our full Snowflake SnowPro Core exam dumps and practice test questions.

Q101. A logistics company wants to analyze real-time sensor data from its delivery trucks. The data includes temperature, location, and fuel consumption. The payload schema changes frequently, and analysts need to run ad-hoc queries on both raw and curated datasets. Which Snowflake approach best supports this scenario?

A Flatten all incoming data into fixed columns before ingestion
B Load data into VARIANT columns in a raw landing table and transform downstream
C Enforce strict typed schemas on ingestion to prevent schema drift
D Store all data externally and query using external tables

Answer: B

Explanation:

Option B is correct because VARIANT columns in Snowflake allow storing semi-structured data such as JSON, XML, or AVRO. By landing raw sensor data in a VARIANT column, the company can preserve the original structure of the data while allowing downstream transformations for curated analytics. This approach supports schema-on-read, enabling analysts to extract new fields or nested arrays dynamically as the schema evolves. Snowflake functions like FLATTEN and OBJECT_INSERT facilitate querying nested structures efficiently, making the solution highly adaptable.

Option A is not ideal because flattening data at ingestion creates rigidity. Any new attributes require schema modifications, leading to ingestion errors and high operational overhead.

Option C fails because strict schemas block ingestion of evolving payloads, which is problematic for dynamic IoT or sensor-based data.

Option D using external storage adds latency and reduces the efficiency of queries. Snowflake cannot optimize micro-partitions for external tables, and joins or transformations become slower, especially with large datasets.

Therefore, B provides the most flexible, performant, and Snowflake-native solution for dynamic, semi-structured sensor data.

Q102. An e-commerce company experiences slow queries when filtering large order tables by customer_id and order_date. Analysts frequently run reports grouped by region. Which Snowflake feature will most efficiently improve query performance for these high-selectivity filters?

A Increase warehouse size
B Define clustering keys on customer_id and order_date
C Store data in external tables
D Flatten tables into fewer columns

Answer: B

Explanation:

Option B is correct because clustering keys allow Snowflake to physically organize micro-partitions based on frequently queried columns such as customer_id and order_date. With clustering, query pruning is more effective, meaning fewer micro-partitions are scanned, which significantly improves query performance. Snowflake’s automatic clustering continuously maintains clustering metadata as new data is ingested, ensuring queries remain fast even as table sizes grow.

Option A increases warehouse size, which may improve performance temporarily but does not reduce the amount of scanned data. It increases compute costs without addressing the root cause of slow queries.

Option C is suboptimal because external tables do not leverage Snowflake’s micro-partitioning optimizations. Queries on external data are slower and may not scale efficiently for billions of rows.

Option D is counterproductive. Flattening tables does not improve pruning efficiency and may increase storage usage while making queries more complex.

Thus, B offers a scalable, native solution that balances query performance and cost efficiency.

Q103. A healthcare company needs to ensure analysts can access patient data for analytics while complying with HIPAA regulations. Certain columns like patient_name and social_security_number must be masked dynamically depending on user roles. Which Snowflake strategy is most appropriate?

A Dynamic data masking and row access policies applied to secure views
B Materialized views with aggregated metrics
C Read-only access to raw patient tables
D Store anonymized copies externally

Answer: A

Explanation:

Option A is correct because dynamic data masking and row access policies provide column-level and row-level security. Secure views ensure users cannot bypass restrictions, while dynamic masking automatically obfuscates sensitive fields depending on the user’s role. Row access policies limit visibility of specific records. This approach allows analysts to run meaningful analytics on authorized data while remaining fully compliant with HIPAA and other privacy regulations.

Option B is insufficient because materialized views only provide aggregated results. Analysts may need granular data for valid analysis, which is not available with pre-aggregated views.

Option C is not secure enough. Read-only access alone does not mask sensitive fields and may expose personal data.

Option D adds complexity and requires ETL pipelines for external storage. This approach introduces latency and limits Snowflake-native performance optimizations.

Therefore, A is the best solution for scalable, compliant data access.

Q104. A company is designing an ETL pipeline where only new or changed records need to be processed. The source tables experience high transaction rates, and transformations must maintain consistency. Which Snowflake feature is best suited for this requirement?

A Streams on source tables
B Time Travel on staging tables
C Secure views on staging tables
D Manual audit tables populated during ETL

Answer: A

Explanation:

Option A is correct because streams capture insert, update, and delete changes on source tables. By querying the stream, ETL processes can handle only the delta changes, reducing unnecessary scans of entire tables. Streams maintain transactional consistency, ensuring that all operations are captured exactly once. They also integrate seamlessly with Snowflake tasks, allowing scheduling and automation of incremental transformations. This approach is reliable, efficient, and native to Snowflake.

Option B is insufficient because Time Travel is designed for historical querying but does not provide incremental change tracking efficiently. Comparing historical snapshots to current data adds computational overhead.

Option C secure views are unrelated to incremental ETL. They enforce access control but do not track changes.

Option D manual audit tables require additional ETL logic, are prone to errors, and cannot guarantee exactly-once processing for multi-statement transactions.

Thus, A is the optimal choice for incremental, transactional ETL pipelines.

Q105. A financial services company must retain historical snapshots of key tables for audit purposes while minimizing storage costs. Analysts need to query the data exactly as it existed months or years ago. Which Snowflake approach is most appropriate?

A Zero-copy clones combined with Time Travel
B Full physical copies for each snapshot
C Store historical data externally and reload as needed
D Maintain manual history tables during ETL

Answer: A

Explanation:

Option A is correct because zero-copy clones allow creating point-in-time snapshots without duplicating underlying storage, while Time Travel provides access to historical versions of the table. This approach is storage-efficient and fully audit-compliant. Analysts can query data exactly as it existed at a previous point in time without complex ETL pipelines. Clones are immutable unless explicitly modified, providing a reliable historical snapshot mechanism.

Option B is inefficient due to storage duplication and increased management overhead.

Option C adds latency and operational complexity; external reloads risk inconsistencies and are not Snowflake-native.

Option D is error-prone and requires additional ETL logic, which increases the risk of inconsistencies and operational burden.

Therefore, A provides a scalable, efficient, and regulatory-compliant historical data solution.

Q106. An organization wants to allow multiple teams to query large datasets simultaneously without impacting each other’s performance. Which Snowflake feature provides the best solution?

A Multi-cluster warehouses with auto-scaling
B Increase virtual warehouse size manually for each team
C Store all data in a single warehouse
D Use external tables for separation

Answer: A

Explanation:

Option A is correct because multi-cluster warehouses represent Snowflake’s purpose-built solution for high-concurrency environments, enabling multiple compute clusters to process queries independently and simultaneously while maintaining complete workload isolation between different user groups, applications, or business functions. Auto-scaling functionality dynamically provisions additional clusters automatically when query queuing occurs or when active query counts exceed configured thresholds, seamlessly handling increased workloads during peak usage periods without requiring manual intervention or affecting query performance for other teams already executing workloads on existing clusters. This elastic scaling ensures consistent query response times across all user groups regardless of concurrent demand fluctuations, eliminates resource contention that would otherwise force queries into queues awaiting available compute capacity, and optimizes operational costs since clusters scale up only when actually needed to satisfy demand and automatically scale down during periods of reduced activity, preventing organizations from perpetually paying for excessive compute capacity maintained solely to handle occasional peak loads. Snowflake’s cloud-native architecture enables complete isolation of computational resources through virtual warehouse separation, where each team or workload category can be assigned dedicated warehouses with independent scaling policies, size configurations, and cost attribution, making this approach ideally suited for multi-tenant analytics platforms, shared data environments serving diverse business units, and high-concurrency scenarios where predictable performance and fair resource allocation across competing workloads represent critical operational requirements.

Option B fails to guarantee workload isolation because assigning different-sized warehouses to teams without multi-cluster capability means that large complex queries from one team consuming extended execution times can monopolize their assigned warehouse, forcing subsequent queries from the same team into queues and creating unpredictable performance degradation, while manual warehouse resizing proves operationally inefficient, introduces human latency in responding to demand changes, risks over-provisioning during quiet periods or under-provisioning during peaks, and requires continuous monitoring and intervention that defeats the purpose of cloud elasticity. Option C creates severe resource contention and delivers poor query performance by forcing all teams to share a single warehouse where compute capacity must be divided among competing workloads, causing large analytical queries to starve smaller operational queries of resources, introducing unpredictable latency as query execution times become dependent on what other users happen to be running concurrently, and eliminating any possibility for workload prioritization, cost attribution to specific business units, or performance guarantees for time-sensitive applications. Option D employing external tables addresses data access patterns by separating storage location but provides no isolation whatsoever for computational resources or query concurrency, since all queries against external tables still execute on Snowflake warehouses where they compete for the same compute capacity, experience identical contention issues, and suffer performance degradation when multiple teams simultaneously query external data sources, while also introducing additional latency from reading data across network boundaries and losing Snowflake’s native optimizations including micro-partition pruning, automatic clustering benefits, and result caching that dramatically accelerate queries against native tables. Therefore, Option A represents the most effective, operationally scalable, and Snowflake-native solution specifically architected for concurrent workloads requiring performance isolation, elastic capacity, cost efficiency, and predictable query execution across multiple competing user populations in enterprise analytics environments.

Q107. A company is ingesting JSON and AVRO data from multiple IoT devices. The payload structure varies frequently, and analysts need to access both raw and curated data. Which Snowflake design pattern best accommodates this requirement?

A Store data in VARIANT columns in a raw table and transform downstream
B Flatten data into structured columns at ingestion
C Enforce strict schemas to prevent schema drift
D Store data externally and query via external tables

Answer: A

Explanation:

Option A is correct because VARIANT columns in Snowflake enable native storage of semi-structured data formats including JSON, Avro, ORC, Parquet, and XML without requiring predefined schemas, making this approach ideally suited for IoT datasets where sensor attributes, telemetry fields, and metadata structures frequently evolve as new device types are deployed, firmware updates introduce additional metrics, or business requirements demand capture of previously untracked parameters. Storing raw IoT data in a landing table with VARIANT columns ensures complete preservation of all original information exactly as received from devices, preventing data loss that might occur if ingestion processes attempted to map incoming data to rigid predefined schemas that lack fields for unexpected attributes. This raw data preservation proves invaluable for analysts conducting exploratory analysis, troubleshooting sensor anomalies, investigating historical incidents, or developing new analytics use cases that require access to attributes not originally considered important enough to extract during initial ETL processing. Simultaneously, downstream ETL processes can systematically transform raw VARIANT data into curated analytical tables with strongly-typed columns optimized for specific business intelligence queries, machine learning model training, or operational dashboards, creating a multi-layered data architecture that balances flexibility with query performance. This approach handles schema evolution gracefully without requiring manual intervention, schema modifications, or pipeline reconfiguration when IoT devices begin reporting new attributes, as the VARIANT column transparently accommodates any valid JSON structure. Snowflake’s schema-on-read paradigm allows queries to extract and transform semi-structured data at query time using powerful semi-structured functions including FLATTEN for array expansion, OBJECT_INSERT for adding attributes, GET for path-based extraction, and dot notation for intuitive field access, enabling analysts to work with evolving schemas without database administrator involvement.

Option B introduces dangerous rigidity by flattening data immediately during ingestion, creating strongly-typed relational schemas that fail catastrophically when IoT devices transmit new attributes not present in predefined table structures, causing ingestion pipeline failures, data loss, and operational disruptions requiring emergency schema modifications and pipeline redeployments during critical production periods. Option C proves fundamentally unsuitable for evolving IoT datasets because enforcing strict schemas with explicit column definitions blocks ingestion of messages containing new or unexpected fields, forcing organizations to choose between rejecting valuable sensor data or implementing complex schema versioning and migration strategies that significantly increase operational complexity and maintenance burden. Option D introduces unacceptable latency and reduces query performance because external tables stored in cloud object storage like S3 or Azure Blob Storage cannot leverage Snowflake’s native optimizations including micro-partitions, automatic clustering, materialized query results caching, or statistics-based query planning, resulting in slower analytical queries, higher compute costs, and degraded user experience compared to data stored in native Snowflake tables. Therefore, Option A provides the optimal combination of schema flexibility accommodating IoT evolution, data preservation ensuring analytical completeness, operational scalability supporting high-volume ingestion, and Snowflake-native efficiency leveraging platform-specific optimizations for semi-structured data processing that collectively enable organizations to build robust, future-proof IoT analytics architectures.

Q108. Analysts frequently filter a large retail sales table by region, product_category, and order_date. Queries have become slower as data grows. Which optimization best improves query performance?

A Define clustering keys on region, product_category, and order_date
B Increase warehouse size
C Store data externally
D Flatten all columns

Answer: A

Explanation:

Option A is correct because clustering keys optimize micro-partition pruning by aligning Snowflake’s physical data organization with common query filter patterns, ensuring queries scan only relevant partitions rather than reading entire tables during execution. When you define clustering keys on columns frequently used in WHERE clauses, JOIN conditions, or GROUP BY operations, Snowflake organizes micro-partitions so that rows with similar values for these dimensions are physically co-located within the same or adjacent storage blocks, creating a natural data ordering that the query optimizer exploits during execution planning. This physical alignment enables the optimizer to examine micro-partition metadata containing minimum and maximum values for clustered columns and instantly eliminate partitions that cannot possibly contain matching rows based on query predicates, a process called partition pruning that can reduce scanned data volumes by 90% or more for well-clustered tables. Snowflake automatically maintains clustering quality as new data is ingested through its automatic clustering service, which continuously monitors clustering depth and overlap metrics, identifies partitions that have degraded below optimal thresholds due to ongoing DML operations, and transparently reorganizes micro-partitions in the background to restore optimal physical layout without requiring manual intervention, maintenance windows, or administrative overhead. This automatic maintenance ensures clustering benefits persist over time even as tables grow to billions of rows and experience constant insert, update, and delete operations that would otherwise gradually degrade physical ordering. The combination of effective partition pruning and automatic clustering maintenance reduces query latency by minimizing data scanning, lowers compute costs by decreasing the amount of processing required per query, and improves concurrency by allowing more queries to execute simultaneously within the same compute budget, making clustering keys particularly valuable for large fact tables experiencing high query volumes with predictable filter patterns on dimensions like date ranges, geographic regions, product categories, or customer segments.

Option B increasing warehouse size temporarily improves query speed by providing more compute resources, parallelism, and memory for query execution, but it does not reduce the fundamental volume of data scanned during queries and therefore raises compute costs proportionally without addressing the underlying inefficiency of full table scans. A larger warehouse processes the same amount of data faster through increased parallelization across more compute nodes, but you still pay for scanning irrelevant partitions that clustering would have eliminated entirely, making this approach a brute-force solution that trades higher ongoing costs for better performance without actually optimizing the workload. While warehouse scaling serves legitimate purposes for handling concurrent users or complex aggregations, it cannot substitute for proper data organization that reduces I/O requirements at the source. Option C external tables reference data stored in cloud object storage like Amazon S3 or Azure Blob Storage without importing it into Snowflake’s native columnar format, which means they do not leverage micro-partitioning architecture, zone maps, or the sophisticated metadata structures that enable efficient partition pruning. Queries against external tables must read and parse raw files sequentially without the benefit of Snowflake’s optimized storage layer, resulting in significantly slower performance compared to native tables even for identical data volumes, and they lack clustering capabilities entirely since clustering requires data to reside within Snowflake’s managed storage where the platform controls physical layout and can reorganize partitions automatically.

Option D flattening nested or semi-structured columns into multiple separate columns increases storage requirements by duplicating data across additional columns and expanding row width, but it does not improve partition pruning effectiveness or query performance for filter operations unless those flattened columns are also designated as clustering keys. Simply denormalizing data structure changes the schema representation without affecting physical data organization or enabling the optimizer to skip irrelevant partitions more effectively, and the increased storage footprint raises costs while potentially degrading performance for queries that don’t benefit from the denormalized structure due to increased I/O requirements for wider rows.

Thus Option A offers the most efficient and cost-effective solution by directly addressing the root cause of slow query performance through intelligent physical data organization that minimizes scanned data volumes, automatically maintains optimization quality over time, and aligns infrastructure costs with actual analytical value delivered rather than simply throwing more compute resources at inefficient workloads.

Q109. A healthcare company needs to maintain strict security for patient data while allowing analysts to perform meaningful analysis. Some columns should be masked based on role. Which solution meets these requirements?

A Secure views with row access policies and dynamic data masking
B Materialized views with aggregated data
C Read-only access to raw tables
D External anonymized copies

Answer: A

Explanation:

Option A is correct because combining secure views, row access policies, and dynamic data masking provides multi-layered protection that addresses different dimensions of data security and privacy while maintaining analytical utility. Row access policies control which records users can see by applying filters based on user roles, departments, regions, or other contextual attributes, ensuring analysts only access data relevant to their organizational scope and preventing unauthorized visibility into restricted customer segments or geographic markets. Dynamic data masking protects sensitive fields like social security numbers, credit card details, email addresses, or salary information by automatically obfuscating or tokenizing these columns for unauthorized users while preserving the underlying data structure and relationships that enable meaningful statistical analysis and trend identification. Secure views prevent bypassing these protective rules by obscuring the underlying query logic and security mechanisms, ensuring users cannot reverse-engineer access patterns or exploit view definitions to circumvent row-level filters or column-level masking policies. This layered security architecture supports regulatory compliance with frameworks like GDPR, HIPAA, CCPA, and PCI-DSS that mandate both access restrictions and data anonymization, while simultaneously enabling meaningful analysis through aggregate statistics, anonymized trends, and pattern recognition that don’t require exposure to personally identifiable information. Analysts can perform cohort analysis, calculate conversion rates, identify behavioral patterns, and generate predictive models using masked datasets that preserve analytical validity while protecting individual privacy, satisfying both compliance requirements and business intelligence needs without forcing organizations to choose between security and insight.

Option B is limited to providing aggregated metrics through pre-computed summaries or materialized views that expose only totals, averages, and statistical distributions without granular record-level access, which severely constrains analytical flexibility and prevents exploratory analysis, detailed segmentation, or ad-hoc investigations that require examining individual transaction patterns or customer behaviors. This approach satisfies basic reporting needs but fails to support sophisticated analytics requiring row-level granularity even with anonymization. Option C is fundamentally insecure because granting read-only access prevents data modification but does not mask sensitive data or restrict row visibility, allowing analysts to view personally identifiable information, confidential business details, and regulated data fields directly, creating compliance violations and privacy breaches regardless of whether users can modify the data they’re viewing.

Option D proposes replicating data to external systems with separate masking infrastructure, which adds architectural complexity, introduces data synchronization challenges, creates latency between source updates and analytical availability, and reduces Snowflake-native performance optimizations while increasing costs through duplicate storage and compute resources across multiple platforms. This approach fragments the data ecosystem, complicates governance, and undermines the unified platform benefits that Snowflake provides.

Therefore, Option A is the optimal, compliant solution that balances rigorous security controls with analytical flexibility through Snowflake’s integrated, high-performance data protection capabilities.

Q110. A company wants to implement an incremental ETL pipeline that only processes new or changed data. The source tables are highly transactional. Which Snowflake feature is most suitable?

A Streams on source tables
B Time Travel
C Secure views
D Manual audit tables

Answer: A

Explanation:

Option A is correct because streams track all inserts, updates, and deletes on source tables through Snowflake’s native change data capture mechanism, enabling incremental processing patterns that dramatically improve ETL efficiency by eliminating redundant full table scans and reprocessing of unchanged data. Streams operate by maintaining transaction offset pointers against source tables and exposing changed rows through a queryable interface that includes metadata columns such as METADATAACTIONindicatingwhethereachrowrepresentsaninsert,update,ordeleteoperation,andMETADATAACTION indicating whether each row represents an insert, update, or delete operation, and METADATA ACTIONindicatingwhethereachrowrepresentsaninsert,update,ordeleteoperation,andMETADATAISUPDATE distinguishing true updates from delete-insert pairs, allowing ETL logic to understand precisely what modifications occurred and respond appropriately with targeted transformations. ETL pipelines can query the stream to process only changed records, reading just the delta since the last consumption point rather than scanning millions or billions of unchanged rows, which reduces compute time, lowers costs, and enables near-real-time data propagation as changes flow continuously from operational systems through staging layers into analytics-ready dimensional models. Streams respect transactional boundaries by capturing changes atomically as they commit, ensuring that related modifications within a single transaction either all appear together in the stream or none appear until the transaction completes, which preserves referential integrity and business logic consistency throughout the pipeline. The stream mechanism supports exactly-once processing semantics through its offset management model where the stream position only advances when downstream processes explicitly consume the changes through INSERT, MERGE, or other DML operations that reference the stream, preventing data loss if processing fails mid-execution and avoiding duplicate processing when pipelines retry after transient failures. This architecture ensures reliable, efficient pipelines that maintain data quality guarantees while minimizing resource consumption, and streams integrate seamlessly with Snowflake Tasks to create fully automated, event-driven architectures where tasks monitor streams for new changes and trigger processing automatically when deltas appear, eliminating polling overhead and ensuring transformations execute immediately after source modifications commit.

Option B Time Travel does not efficiently track incremental changes for ETL purposes because it provides point-in-time snapshots of table state at historical moments rather than exposing the delta between states or identifying which specific rows changed between time points. While Time Travel enables querying tables as they existed hours or days ago using AT or BEFORE clauses, determining what changed requires expensive comparison queries that join current state against historical snapshots and identify differences through row-by-row comparisons, which scales poorly as table size increases and provides no metadata indicating the type of change operation that occurred. This approach consumes significant compute resources performing full table scans and complex joins rather than efficiently reading a curated change log, and it cannot distinguish between an update operation versus a delete followed by insert with the same primary key, making it unsuitable for CDC workflows requiring precise change semantics. Time Travel serves disaster recovery, auditing, and analytical use cases requiring historical data access but lacks the incremental change tracking capabilities and processing efficiency that streams provide natively. Option C secure views enforce row-level and column-level access control by applying filters and masking transformations based on user context and session attributes, protecting sensitive data and implementing data governance policies, but they do not provide change tracking functionality or expose modification history. Secure views are query-time security constructs that dynamically restrict what data users can see rather than tracking what data has changed, making them orthogonal to ETL change detection requirements where the goal is identifying and processing deltas rather than controlling access to query results.

Option D manual audit tables require additional application logic, trigger-like mechanisms through stored procedures, or custom instrumentation to capture changes by explicitly logging modification events into separate tracking tables, creating an error-prone approach that introduces complexity, maintenance overhead, and potential reliability issues. This pattern requires developers to remember inserting audit records for every DML operation across all tables, ensuring audit logic executes atomically with business logic to prevent inconsistencies, and handling edge cases like bulk operations, error conditions, and concurrent modifications that could cause audit records to be missed, duplicated, or inconsistent with actual data state. Manual audit tables also consume additional storage for redundant change logs, require custom query logic to parse and process audit records, and lack the transactional guarantees and offset management semantics that streams provide natively, making them fragile compared to Snowflake’s built-in change tracking capabilities.

Thus Option A is the most efficient and Snowflake-native solution, leveraging purpose-built change data capture infrastructure that provides reliable incremental processing with minimal development effort, optimal performance characteristics, and seamless integration with modern data pipeline architectures.

A Snowflake architect wants to optimize the performance of queries that filter on multiple high-cardinality columns in a large fact table. Which feature is most appropriate?

A Clustering keys
B Search optimization service
C Multi-cluster warehouse scaling
D Time Travel

Correct Answer: A

Explanation:

Option A is correct because clustering keys allow Snowflake to physically organize table micro-partitions according to the values of specified columns. For large fact tables with high-cardinality filtering, clustering minimizes the amount of data scanned during query execution, improving performance significantly. By grouping similar values together, Snowflake can prune irrelevant partitions efficiently. Clustering is particularly effective for repeated queries on predictable columns without restructuring the dataset.

Option B is not correct because the search optimization service is intended for selective point-access queries, not broad analytical filtering over multiple high-cardinality columns. Its effectiveness diminishes in large-scale aggregation queries.

Option C is not correct because multi-cluster warehouses manage concurrency but do not optimize I/O patterns or partition scanning. Adding clusters helps simultaneous queries but does not improve per-query scan efficiency.

Option D is not correct because Time Travel is a historical versioning feature. It allows querying past states of data but has no impact on filtering or query performance.

Thus, clustering keys directly address the problem of high-cardinality filtering.

A company wants to enforce that all sensitive PII columns are automatically masked based on user roles. Which Snowflake feature should be implemented?

A Dynamic data masking policies
B Row access policies
C Transient tables
D Streams

Correct Answer: A

Explanation:

Option A is correct because dynamic data masking policies allow Snowflake to automatically obfuscate column values depending on the querying user’s role. This ensures that sensitive PII data is never exposed to unauthorized users, while authorized users can still access the unmasked data. Masking policies are applied at the column level and are transparent to queries, reducing operational risk and simplifying compliance.

Option B is not correct because row access policies filter data rows, not column values. They cannot selectively mask PII within a row.

Option C is not correct because transient tables relate to storage lifecycle and do not provide security or masking capabilities.

Option D is not correct because streams track table changes for incremental processing and have no security or masking functionality.

Therefore, dynamic data masking policies are the appropriate solution.

A data engineer wants to automatically trigger a downstream ETL pipeline in Snowflake whenever new files arrive in an S3 bucket. Which combination provides the lowest latency?

A Snowpipe with event notifications
B Scheduled COPY INTO every hour
C External tables with manual refresh
D Using a task to poll staged data every 15 minutes

Correct Answer: A

Explanation:

Option A is correct because Snowpipe with event notifications allows event-driven ingestion. Cloud storage events notify Snowpipe immediately when new files land in S3, triggering automated, near real-time ingestion into Snowflake. This architecture provides the lowest possible latency for downstream pipelines while eliminating the need for manual intervention or frequent polling.

Option B is not correct because scheduled COPY INTO runs on fixed intervals. It introduces latency proportional to the schedule frequency and wastes compute when no new files exist.

Option C is not correct because external tables only reference external data and require manual or scheduled refresh to see updates. They are not near real-time.

Option D is not correct because polling with tasks adds latency and compute inefficiency compared to event-driven Snowpipe.

Snowpipe with event notifications is the most efficient and low-latency approach.

A Snowflake administrator notices that certain queries are performing poorly despite having appropriate warehouse sizing. Which Snowflake-native feature should be implemented first to improve large table query performance?

A Clustering keys
B Increasing Time Travel retention
C Transient tables
D Network policies

Correct Answer: A

Explanation:

Option A is correct because clustering keys help optimize data access by physically organizing micro-partitions according to frequently filtered or joined columns. This reduces the number of partitions scanned and significantly improves query performance for large tables. Clustering is especially effective when queries consistently filter or join on specific columns.

Option B is not correct because Time Travel retention impacts historical query access but does not improve query performance.

Option C is not correct because transient tables mainly affect storage retention but provide no query optimization.

Option D is not correct because network policies manage IP-level access and do not influence query execution.

Thus, clustering keys directly address performance bottlenecks for large tables.

An analyst wants to maintain historical snapshots of a slowly changing dimension in Snowflake without duplicating the entire table each time. Which approach is most efficient?

A Use streams to track changes and insert delta into a history table
B Clone the table on every change
C Create transient tables for historical data
D Use network policies to block updates

Correct Answer: A

Explanation:

Option A is correct because streams efficiently track inserted, updated, or deleted rows. By capturing deltas, only the changes are appended to a historical table, minimizing storage and compute usage while preserving a full history. This pattern enables incremental SCD management without duplicating the entire table for every change.

Option B is not correct because cloning the table for every change produces unnecessary storage overhead and is not efficient for incremental snapshots.

Option C is not correct because transient tables do not inherently track changes and provide no incremental snapshot mechanism.

Option D is not correct because network policies have no effect on historical data tracking or SCD management.

Using streams with history tables is the optimal Snowflake-native approach for incremental historical snapshots.

A Snowflake user wants to optimize queries joining two large tables on multiple columns that are frequently filtered. What method reduces scan times most effectively?

A Clustering keys on join and filter columns
B Increasing warehouse size
C External tables for intermediate staging
D Time Travel to reference prior states

Correct Answer: A

Explanation:

Option A is correct because clustering keys physically sort data based on frequently filtered and joined columns. This minimizes scanned micro-partitions and enhances query performance on large fact tables. By aligning partitions with common query patterns, Snowflake can prune unnecessary partitions during execution, significantly reducing compute usage and query time.

Option B is not correct because larger warehouses improve raw compute but do not optimize I/O or reduce data scanned.

Option C is not correct because external tables rely on external storage and often have slower performance, not suitable for high-performance joins.

Option D is not correct because Time Travel is for historical querying, not performance optimization.

Clustering keys remain the most effective method for reducing scan times in multi-column joins.

A Snowflake administrator needs to restrict which rows a user sees based on their department. Which feature enforces row-level security efficiently?

A Row access policies
B Masking policies
C Transient tables
D Streams

Correct Answer: A

Explanation:

Option A is correct because row access policies filter rows dynamically based on user context or session attributes. This enforces department-based visibility without duplicating tables or manually managing subsets. It supports fine-grained security while remaining fully transparent to users’ queries.

Option B is not correct because masking policies control column-level obfuscation, not row-level filtering.

Option C is not correct because transient tables affect storage retention and do not implement access control.

Option D is not correct because streams track changes but do not restrict data visibility.

Row access policies provide the correct method for efficient, secure row-level control.

A Snowflake team wants to maintain near real-time pipelines with minimal latency. Which combination achieves this?

A Snowpipe with event notifications and tasks consuming streams
B COPY INTO scheduled every hour
C External tables with manual refresh
D Transient tables for staging data

Correct Answer: A

Explanation:

Option A is correct because Snowpipe with event notifications ensures that files arriving in cloud storage are ingested immediately. Tasks consuming streams allow incremental processing of only new or changed rows. This combination provides continuous, low-latency pipelines that are efficient in compute usage and fully Snowflake-native.

Option B is not correct because scheduled COPY INTO introduces fixed latency and inefficiency.

Option C is not correct because external tables rely on refresh intervals and cannot support near real-time pipelines.

Option D is not correct because transient tables impact storage but do not address pipeline latency.

Snowpipe with tasks and streams offers the most efficient real-time solution.

A Snowflake developer notices that queries frequently involve filtering on a date column in a large historical table. What is the most appropriate performance optimization?

A Cluster the table on the date column
B Use masking policies on the date column
C Convert to a transient table
D Increase warehouse size

Correct Answer: A

Explanation:

Option A is correct because clustering the table on the frequently filtered date column organizes micro-partitions physically by date, enabling efficient pruning. Queries only scan relevant partitions, reducing latency and compute usage. This is especially effective for historical datasets with wide date ranges.

Option B is not correct because masking policies do not improve filtering or query speed.

Option C is not correct because transient tables do not affect partitioning or query performance.

Option D is not correct because increasing warehouse size only adds compute but does not optimize I/O or micro-partition scanning.

Clustering on the date column is the ideal solution.

A Snowflake administrator wants to reduce costs while ensuring warehouses automatically suspend during idle periods. Which approach is correct?

A Enable auto-suspend with an appropriate idle timeout
B Increase warehouse size to minimize query time
C Convert tables to transient to save compute
D Use network policies to restrict usage

Correct Answer: A

Explanation:

Option A is correct because enabling auto-suspend represents Snowflake’s primary native mechanism for controlling idle compute costs by automatically stopping warehouses after administrator-defined periods of inactivity, effectively eliminating charges for compute resources that remain provisioned but unused when no queries are actively executing. By carefully selecting appropriate idle timeout values—balancing between aggressive suspension for maximum cost savings and more conservative timeouts that reduce startup latency—organizations can optimize the tradeoff between minimizing unnecessary compute charges during idle periods and maintaining responsiveness for interactive workloads where users expect immediate query execution without waiting for warehouse resumption. When queries arrive after a warehouse has been auto-suspended, Snowflake’s auto-resume functionality transparently restarts the warehouse within seconds, loading necessary cache data and resuming query processing with minimal user-perceived delay, creating a cost-efficient operational model where compute capacity exists and incurs charges only during actual usage periods rather than continuously consuming credits during extended idle intervals between query bursts. This approach proves particularly effective for development environments used intermittently throughout workdays, batch processing warehouses executing scheduled jobs with predictable idle gaps, ad-hoc analytical warehouses serving sporadic business intelligence requests, and any scenario where query patterns include substantial periods without activity that would otherwise waste compute resources and unnecessarily inflate operational expenses.

Option B proves incorrect because increasing warehouse size to larger configurations like X-Large or 2X-Large increases compute consumption proportionally, with each size tier doubling both processing power and per-second credit consumption, meaning larger warehouses actually accelerate cost accumulation rather than reducing expenses unless the performance gains enable query completion in sufficiently shorter timeframes that total credit consumption decreases despite higher per-second rates—an outcome dependent on specific query characteristics and workload parallelization that cannot be guaranteed across diverse query types. Option C incorrectly suggests transient tables reduce costs, when transient tables only affect storage retention policies by eliminating Time Travel beyond one day and preventing Fail-safe data protection, marginally reducing storage costs but providing absolutely no impact on compute expenses since warehouse credit consumption depends entirely on query execution time and warehouse size regardless of whether queries access permanent, transient, or temporary table types.

Option D misunderstands network policies, which serve security functions by restricting network access to Snowflake accounts based on IP address allowlists or blocklists, controlling who can authenticate and connect to the environment but providing no functionality whatsoever for suspending idle warehouses, managing compute resources, or reducing operational costs, as these access control mechanisms operate independently from warehouse lifecycle management. Therefore, auto-suspend configuration represents the most direct, effective, and Snowflake-native method for controlling idle compute costs in cloud data warehouse environments.

Uncategorized

Related posts:

Leave a Reply Cancel reply