Snowflake SnowPro Core Recertification (COF-R02) Exam Dumps and Practice Test Questions Set10 Q181-200

Visit here for our full Snowflake SnowPro Core exam dumps and practice test questions.

Q181. An account administrator sets a resource monitor (ACCT_MONITOR) at the account level with a 10,000 credit quota. A warehouse administrator sets a separate resource monitor (WH_MONITOR) on the BI_WAREHOUSE with a 1,000 credit quota. Both monitors are set to SUSPEND_IMMEDIATE. The BI_WAREHOUSE consumes 1,100 credits, while the total account consumption (including the BI_WAREHOUSE) is 5,000 credits. What is the expected outcome?

A) The entire account will be suspended by ACCT_MONITOR.
B) Only the BI_WAREHOUSE will be suspended by WH_MONITOR.
C) Both the BI_WAREHOUSE and the entire account will be suspended.
D) Neither the warehouse nor the account will be suspended.

Answer: B

Explanation:

Option B is the correct answer. Snowflake’s resource monitor hierarchy dictates that the monitor assigned directly to a warehouse (or set of warehouses) takes precedence for those specific resources. In this scenario, WH_MONITOR is assigned directly to BI_WAREHOUSE and has a quota of 1,000 credits. When the BI_WAREHOUSE consumes 1,100 credits, it breaches its own monitor’s quota. This triggers the SUSPEND_IMMEDIATE action on WH_MONITOR, which suspends only the BI_WAREHOUSE.

Option A is incorrect because the account-level monitor, ACCT_MONITOR, has a 10,000 credit quota. The total account consumption is only 5,000 credits, which has not breached this quota. The account monitor is not triggered.

Option C is incorrect because, as explained above, the account-level monitor’s quota has not been met. Only the warehouse-specific monitor’s quota has been breached.

Option D is incorrect because the BI_WAREHOUSE clearly violated the 1,000 credit quota of the WH_MONITOR that was explicitly assigned to it, so an action (suspension) is definitely expected.

Q182. A 200TB table containing IoT logs in a VARIANT column is queried frequently. Analysts need to perform fast point-lookups to find all records matching a specific device_id (e.g., WHERE log_payload:device_id = ‘abc-123’). These queries are currently slow and result in full table scans. Which feature is specifically designed to accelerate this type of equality search on semi-structured data?

A) A Materialized View that extracts the device_id.
B) A clustering key defined on an expression of log_payload:device_id.
C) The Search Optimization Service.
D) A standard view that flattens the log_payload.

Answer: C

Explanation:

Option C is the correct answer. The Search Optimization Service (SOS) is a performance-tuning feature explicitly built to accelerate point-lookup (equality), substring, and IN clause queries on very large tables. Critically, it is designed to work on high-cardinality columns and paths within semi-structured VARIANT data, which is the exact use case described (WHERE log_payload:device_id = ‘…’). It builds a persistent, optimized search index in the background to avoid full table scans.

Option A, a Materialized View, could be used to extract the device_id, but it is a heavier solution. An MV pre-computes the results of a query. While querying the MV would be fast, it adds storage cost for the pre-computed results and compute cost for maintenance. SOS is a more direct and often more efficient solution for just accelerating lookups on the base table.

Option B, a clustering key, is not ideal for this scenario. Clustering keys are most effective for low-cardinality columns or for range scans (like dates). A device_id is likely to be very high-cardinality, meaning a clustering key would provide little to no partition pruning and would be expensive to maintain.

Option D, a standard view, provides no performance benefit. A view is just a stored query. Querying the view would still execute the full table scan on the base table at runtime, which is the problem that needs to be solved.

Q183. A company has a 10TB table. The table’s DATA_RETENTION_TIME_IN_DAYS is set to 1. After 24 hours, an administrator accidentally drops the table. Eight days later, the team realizes they need the data. They contact Snowflake Support to recover the data from Fail-safe. Which statement is true regarding the storage costs associated with this scenario?

A) Storage costs are only incurred for the 1-day Time Travel period.
B) Storage costs are incurred for both the 1-day Time Travel and the 7-day Fail-safe period.
C) No storage costs are incurred after the table is dropped, as it enters a free recovery window.
D) Fail-safe is a free, value-added service and does not incur storage costs.

Answer: B

Explanation:

Option B is the correct answer. Snowflake bills for all data stored, regardless of its state (active, Time Travel, or Fail-safe). When the administrator drops the table, it first enters the 1-day Time Travel period, during which the customer is billed for that 10TB of storage. After the 1-day Time Travel window expires, the data transitions into the 7-day Fail-safe period (the default for permanent tables). The customer is also billed for the 10TB of storage for these 7 days. Therefore, the company incurs storage costs for the entire 8-day period (1 day of Time Travel + 7 days of Fail-safe) until the data is permanently purged.

Option A is incorrect because it ignores the storage costs associated with the 7-day Fail-safe period.

Option C and Option D are both incorrect. Fail-safe is not a free service. It is a data recovery mechanism, but the underlying storage consumed by the data in the Fail-safe state is billable, just like active storage or Time Travel storage.

Q184. A data engineer is processing a stream on a USERS table. They see two rows in the stream for user_id = 123. One row has METADATA$ACTION = ‘DELETE’ and METADATA$ISUPDATE = ‘TRUE’. The other row has METADATA$ACTION = ‘INSERT’ and METADATA$ISUPDATE = ‘TRUE’. What single DML operation occurred on the USERS table to produce these two stream records?

A) DELETE FROM USERS WHERE user_id = 123
B) INSERT INTO USERS … followed by a DELETE FROM USERS …
C) UPDATE USERS SET … WHERE user_id = 123
D) MERGE INTO USERS … that resulted in a delete operation.

Answer: C

Explanation: Option C is the correct answer. This is the classic signature of an UPDATE operation as captured by a standard Snowflake stream. An UPDATE is not recorded as a single “update” row. Instead, it is registered as a pair of records: first, a DELETE record representing the state of the row before the update, and second, an INSERT record representing the state of the row after the update. The key identifier is the METADATA$ISUPDATE = ‘TRUE’ column, which flags both of these records as being part of a single UPDATE operation.

Option A, a simple DELETE, would produce only one row in the stream: METADATA$ACTION = ‘DELETE’ and METADATA$ISUPDATE = ‘FALSE’.

Option B, an INSERT followed by a DELETE, would produce two rows, but they would both have METADATA$ISUPDATE = ‘FALSE’.

Option D, a MERGE, is a possibility, as a MERGE can perform UPDATEs. However, the most direct, common, and unambiguous DML operation that produces this exact signature is UPDATE.

Q185. A COPY INTO command is loading 100 CSV files from a stage. The business requires that the load must not stop, even if some files contain formatting errors or data type mismatches. All valid rows from all files should be loaded. However, if a file is found to have any error, the entire file should be skipped, and the load should move to the next file. Which ON_ERROR setting achieves this?

A) ON_ERROR = ‘ABORT_STATEMENT’
B) ON_ERROR = ‘CONTINUE’
C) ON_ERROR = ‘SKIP_FILE’
D) ON_ERROR = ‘SKIP_FILE_5’

Answer: C

Explanation: Option C is the correct answer. The ON_ERROR = ‘SKIP_FILE’ option does exactly what the requirement asks for. When the COPY command encounters the first data error in a file (e.g., a data type mismatch), it immediately stops processing that file, discards any rows it might have loaded from it, and moves on to the next file in the list. This ensures that a single bad file does not stop the entire batch.

Option A, ON_ERROR = ‘ABORT_STATEMENT’, is the default. It would stop the entire COPY operation on the first error found in any file, which is the opposite of the requirement.

Option B, ON_ERROR = ‘CONTINUE’, is a different error-handling strategy. It would skip the invalid rows within a file but would continue processing the rest of the file and load all valid rows from it. The requirement was to skip the entire file if any error was found.

Option D, ON_ERROR = ‘SKIP_FILE_5’, is a variation that skips the file only after 5 errors (or 5% of rows, depending on the value) are found. The requirement was to skip on the first error.

Q186. A company’s data analytics team runs a complex financial modeling query at the end of every month. The query is a single, massive query that takes 4 hours to run on a ‘Medium’ warehouse. The team wants to reduce this runtime to under 1 hour. Concurrency (the number of simultaneous queries) is not an issue. What is the most appropriate action to take?

A) Change the warehouse scaling policy from ‘Standard’ to ‘Economy’.
B) Increase the warehouse’s maximum cluster count from 1 to 4.
C) Resize the warehouse from ‘Medium’ to ‘X-Large’ before running the query.
D) Create a new, separate ‘Medium’ warehouse just for this query.

Answer: C

Explanation: Option C is the correct answer. This is a classic “vertical scaling” problem. The issue is a single, large, complex query that is slow. Resizing the warehouse to a larger size (e.g., from Medium to X-Large) provides more compute resources (CPU, memory, and temporary disk I/O) to that single query, allowing it to execute more operations in parallel and finish faster.

Option A and Option B are solutions for “horizontal scaling,” which addresses concurrency. Increasing the cluster count (B) or adjusting the scaling policy (A) allows the warehouse to handle more queries at the same time by adding more clusters. Since the problem states concurrency is not the issue, these options are incorrect and would not speed up the single query.

Option D, creating a new ‘Medium’ warehouse, does nothing to solve the performance problem. The query would still take 4 hours to run, just on a different ‘Medium’ warehouse.

Q187. A data governance team needs to implement a policy where sales managers can only see data belonging to their own region. The SALES table has a REGION column, and a separate MANAGER_REGION_MAPPING table maps CURRENT_USER() to a region. The team wants a solution that is centrally managed and automatically filters rows for any query against the SALES table, without requiring users to query a specific view. What feature should be used?

A) A Secure View.
B) A Dynamic Data Masking policy.
C) A Row-Level Access Policy.
D) Granting SELECT privileges only on a standard view.

Answer: C

Explanation: Option C is the correct answer. This is the textbook use case for a Row-Level Access Policy (RLS). An RLS policy is a schema-level object that is attached to the base table (SALES). This policy contains a query (e.g., one that looks up the user’s region in the MANAGER_REGION_MAPPING table). At query time, Snowflake’s optimizer automatically and transparently appends this policy’s filter to any query a user runs against the SALES table. This achieves the goal of filtering rows without requiring the user to query a special view.

Option A and Option D involve creating a view. While a view could be built to contain the filtering logic, it fails the requirement. The user would have to be trained to query V_SALES_SECURE instead of SALES. RLS is superior because it protects the base table directly.

Option B, Dynamic Data Masking, is the wrong type of security. Masking is used to redact or obscure column values (e.g., show *** instead of a social security number). It does not filter or hide entire rows.

Q188. A developer is creating a table to store JSON data. The source JSON can be either a single JSON object (e.g., {“a”: 1}) or a JSON array (e.g., [1, 2, 3]). The data type of the root element is unpredictable. Which column data type must be used to store this data without loss or error?

A) OBJECT
B) ARRAY
C) VARIANT
D) VARCHAR

Answer: C

Explanation: Option C is the correct answer. The VARIANT data type is Snowflake’s “super-type” for semi-structured data. It is specifically designed to store any valid JSON value, including JSON objects, JSON arrays, or scalar values (strings, numbers, booleans, nulls). Since the incoming data can be either an object or an array, VARIANT is the only type that can handle both.

Option A, OBJECT, is a data type that can only store JSON objects (key-value pairs enclosed in {…}). If the source data was an array ([…]), attempting to load it into an OBJECT column would result in an error.

Option B, ARRAY, is a data type that can only store JSON arrays ([…]). If the source data was an object ({…}), attempting to load it into an ARRAY column would result in an error.

Option D, VARCHAR, could technically store the data as a raw string. However, this is a poor practice as the data would not be queryable using Snowflake’s optimized semi-structured functions (e.g., col:path). Every query would require a PARSE_JSON function, which is inefficient.

Q189. A warehouse administrator needs to run a query to find the total credit consumption of all warehouses in the account yesterday. The query must be fast and must not require a running virtual warehouse to execute. Which schema and view should be used?

A)WAREHOUSE_METERING_HISTORY
B) ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
C) INFORMATION_SCHEMA.QUERY_HISTORY
D) ACCOUNT_USAGE.QUERY_HISTORY

Answer: B

Explanation: Option B is the correct answer. The ACCOUNT_USAGE schema is a special, account-wide schema that contains historical metadata for the entire account. The WAREHOUSE_METERING_HISTORY view within it contains the credit consumption data needed. Crucially, queries against any view in the ACCOUNT_USAGE schema are processed by the Cloud Services layer and do not require a running virtual warehouse, fulfilling a key requirement. This schema contains data with a slight latency (e.g., up to 45-180 minutes) but stores it for 365 days, making it perfect for “yesterday’s” data.

Option A and Option C use the INFORMATION_SCHEMA. Queries against the INFORMATION_SCHEMA do require a running, active warehouse. Furthermore, the INFORMATION_SCHEMA has a limited retention (7 days for most views, 14 for QUERY_HISTORY), so while it could retrieve yesterday’s data, it fails the “no warehouse” requirement.

Option D, ACCOUNT_USAGE.QUERY_HISTORY, would also not require a warehouse, but it provides data on a per-query basis. While you could sum the credits from this view, WAREHOUSE_METERING_HISTORY (Option B) is the direct, pre-aggregated, and correct view for warehouse-level credit consumption.

Q190. A data engineer needs to create a task (TASK_B) that runs only after another task (TASK_A) has successfully completed. TASK_A is scheduled to run daily at 8:00 AM. What is the most efficient way to define TASK_B?

A) Schedule TASK_B to run at 8:05 AM using SCHEDULE = ‘5 MINUTE’.
B) Define TASK_B with AFTER TASK_A in its definition.
C) Define TASK_B with WHEN SYSTEM$TASK_A_STATUS = ‘SUCCEEDED’.
D) Use a stream on TASK_A’s output table to trigger TASK_B.

Answer: B

Explanation: Option B is the correct answer. Snowflake Tasks support predecessor-based scheduling, allowing for the creation of Directed Acyclic Graphs (DAGs). By defining TASK_B and including the AFTER TASK_A clause, you create a direct dependency. TASK_B will be in a suspended state until its predecessor (TASK_A) runs and completes successfully. This is the most robust and efficient method, as it is not dependent on time and automatically handles a case where TASK_A might run longer than expected.

Option A is a “time-based” dependency and is a poor practice. If TASK_A takes 6 minutes to run one day, TASK_B will run before TASK_A is finished, leading to incorrect data or errors.

Option C is not valid Snowflake SQL syntax for a task definition. The WHEN clause is used for boolean expressions (like SYSTEM$STREAM_HAS_DATA), not for checking the status of other tasks.

Option D is overly complex and indirect. While a stream could be used to trigger a task (using WHEN SYSTEM$STREAM_HAS_DATA(…)), it’s designed for reacting to data changes in a table, not for managing a task’s execution workflow.

Q191. A company is migrating its data ingestion from the Kafka Connector (which uses Snowpipe) to the new Snowpipe Streaming API. What is the primary architectural difference and benefit of this change?

A) Snowpipe Streaming uses a “bring-your-own-warehouse” model, giving more control over compute.
B) Snowpipe Streaming writes row-sets directly to Snowflake tables, bypassing intermediate cloud storage files.
C) Snowpipe Streaming uses the COPY command, which is more robust than Snowpipe’s INSERT command.
D) Snowpipe Streaming provides lower latency by creating larger, more optimized Parquet files.

Answer: B

Explanation: Option B is the correct answer. The key architectural shift with Snowpipe Streaming is the elimination of intermediate file staging. Traditional Snowpipe (and the Kafka connector built on it) works by batching messages, writing them to files (e.g., Parquet) in an internal or external stage, and then using the COPY command to load those files. Snowpipe Streaming uses a new SDK to write row-sets directly from the client application (e.g., Kafka) into the Snowflake table’s write-ahead-log, completely bypassing the file creation step. This avoidance of file staging is what dramatically reduces latency from “minutes” to “seconds.”

Option A is false. Snowpipe Streaming uses serverless compute, similar to Snowpipe, but it is billed differently (per-row-set, not per-file). It does not use a user-managed virtual warehouse.

Option C is backward. Traditional Snowpipe uses the COPY command. Snowpipe Streaming uses a new, non-COPY ingestion method.

Option D is false. The benefit of Snowpipe Streaming is that it avoids files altogether, it does not optimize their creation.

Q192. A developer is working on their local laptop and has a file named data.csv in their C:\temp\ directory. They need to upload this file to a Snowflake internal named stage called my_stage. Which client and command must be used?

A) The Snowsight UI, using the “Load Data” wizard.
B) The PUT command, executed from the Snowsight UI worksheet.
C) The PUT command, executed using the SnowSQL client.
D) The COPY INTO @my_stage command, executed from SnowSQL.

Answer: C

Explanation: Option C is the correct answer. The PUT command is the specific command used to upload files from a local client file system (like C:\temp\) to a Snowflake stage (internal or external). This command cannot be run from the web-based Snowsight UI because a web browser does not have access to your local file system. It must be run from a client-side tool like SnowSQL (or via a driver like the Python Connector).

Option A, the “Load Data” wizard, is a graphical tool that does perform this action, but it’s a wizard, not a command. Behind the scenes, the wizard is likely using a PUT command, but the user does not type the command. Option C is the direct command-line method.

Option B is impossible. Running PUT C:\temp\data.csv @my_stage; in a Snowsight worksheet will fail, as the Snowflake server executing the worksheet has no access to the user’s local C:\ drive.

Option D is the wrong command. COPY INTO … is used to load data from a stage into a table, or to unload from a table to a stage. It is not used to upload from a local machine.

Q193. A data architect is proposing a Materialized View (MV) to speed up a complex analytics dashboard. Which of the following is a valid limitation that would prevent the MV from being created?

A) The MV’s query contains a non-deterministic function like CURRENT_TIMESTAMP().
B) The MV’s query contains a GROUP BY clause.
C) The MV’s query is defined on a table with a VARIANT column.
D) The MV’s query contains a standard INNER JOIN to a dimension table.

Answer: A

Explanation: Option A is a core limitation of Materialized Views. An MV stores a pre-computed result set. This is only possible if the result set is deterministic, meaning the output of the query depends only on the data in the base tables. A non-deterministic function like CURRENT_TIMESTAMP() or RANDOM() produces a different value every time it is run, so it is prohibited in an MV definition because the “correct” value to materialize would be ambiguous and constantly changing.

Option B is incorrect. MVs are excellent for pre-aggregating data and are very commonly used with GROUP BY clauses. This is one of their primary use cases.

Option C is incorrect. MVs can be defined on tables with VARIANT columns, and they can even access paths within the VARIANT data.

Option D is incorrect. MVs fully support INNER JOIN operations and many types of OUTER JOINs. Joining tables is a very common reason to create an MV.

Q194. A data provider (Company P) shares data with a data consumer (Company C) using a Snowflake reader account. Company C runs several large queries, consuming 100 credits of compute on a warehouse within the reader account. Who is responsible for paying the invoice for these 100 credits?

A) Company C, because they are the consumer that ran the queries.
B) Company P, because they are the provider that owns the reader account.
C) The cost is split 50/50 between Company P and Company C by Snowflake.
D) Snowflake, as compute within a reader account is free up to a certain limit.

Answer: B

Explanation: Option B is the correct answer. This is the fundamental billing model of a reader account. A reader account is a special type of account created by and owned by the data provider (Company P). The provider gives access to this account to the consumer (Company C). All compute and storage costs generated within that reader account are billed directly to the provider (Company P). It is then up to Company P to decide if they want to pass those costs on to Company C through a separate, external invoice, but as far as Snowflake is concerned, the provider pays.

Option A is incorrect. The consumer (Company C) in a reader account setup never receives a bill from Snowflake.

Option C is incorrect. Snowflake does not have a 50/50 split model; the billing is 100% to the provider.

Option D is incorrect. While creating the account is free, the consumption (compute and storage) within it is not free and is billed at standard Snowflake rates.

Q195. A table RAW_LOGS has a VARIANT column JSON_PAYLOAD that contains a JSON object with an array of error codes: {“errors”: [404, 500, 401]}. A user needs to write a query that “un-nests” this array, producing three separate rows: one for 404, one for 500, and one for 401. Which function or operator is required?

A) PARSE_JSON(JSON_PAYLOAD:errors)
B) OBJECT_CONSTRUCT(‘code’, …)
C) LATERAL FLATTEN(input => JSON_PAYLOAD:errors)
D) ARRAY_TO_STRING(JSON_PAYLOAD:errors)

Answer: C

Explanation: Option C is the correct answer. The FLATTEN function is a table function specifically designed to un-nest semi-structured data. The input parameter points to the array that needs to be “flattened” (in this case, JSON_PAYLOAD:errors). It is used with the LATERAL keyword to join the new rows produced by FLATTEN (one for each element in the array) back to the original row from RAW_LOGS. The value column produced by FLATTEN will contain 404, 500, and 401 on separate rows.

Option A, PARSE_JSON, is used to convert a VARCHAR into a VARIANT. The data is already in a VARIANT column.

Option B, OBJECT_CONSTRUCT, is used to create a JSON object, which is the opposite of the requirement.

Option D, ARRAY_TO_STRING, would convert the array [404, 500, 401] into a single string like ‘404,500,401’, not into three separate rows.

Q196. A data architect has just defined a new clustering key on a 50TB table that is actively receiving 1TB of new data per day. What are the two primary sources of cost associated with maintaining this clustering key?

A) Query costs for users reading the table and data transfer (egress) costs.
B) Compute costs for the automatic clustering service and additional storage costs from data churn.
C) Software license fees for the clustering feature and compute costs from the RECLUSTER command.
D) Storage costs for the clustering metadata and user management costs for the clustering role.

Answer: B

Explanation: Option B correctly identifies the two main cost drivers for automatic clustering. 1) Compute: Snowflake runs a serverless, background service (the Automatic Clustering service) to perform the re-clustering. This service consumes compute credits, which are billed to the account. It rewrites micro-partitions to improve the data’s physical layout. 2) Storage: When this service rewrites data, it creates new micro-partitions and marks the old ones for deletion. However, due to Time Travel, these old partitions are retained for the duration of the Time Travel window (e.g., 1 day, 7 days, etc.). This “data churn” temporarily increases the total storage footprint of the table, as both the old and new versions of the data co-exist.

Option A is incorrect. Clustering reduces query costs; it doesn’t cause them. Egress costs are unrelated.

Option C is incorrect. There is no separate “license fee” for clustering; it is an Enterprise Edition feature. Also, the modern RECLUSTER command is deprecated in favor of the Automatic Clustering service.

Option D is incorrect. Metadata storage is negligible and part of the standard Snowflake storage cost. User management costs are not a factor.

Q197. A user submits a query to a ‘Large’, single-cluster warehouse. The query is immediately put into a ‘Queued’ state. The user checks the warehouse load and sees it is ‘Running’ and 100% utilized. What is the most likely reason the query is queued?

A) The warehouse is suspended, and the AUTO_RESUME parameter is set to FALSE.
B) The user’s role does not have the USAGE privilege on the warehouse.
C) The warehouse is configured in ‘Economy’ mode and is waiting to start a new cluster.
D) The warehouse is single-cluster and is already running the maximum number of concurrent queries it can handle.

Answer: D

Explanation: Option D is the correct answer. Every virtual warehouse, based on its size (e.g., ‘Large’), has a finite number of “slots” or resources to execute queries concurrently. The prompt states the warehouse is ‘single-cluster’ and ‘100% utilized’. This means all its available resources are already consumed by other actively running queries. When a new query arrives at a fully saturated single-cluster warehouse, Snowflake has no choice but to place it in a queue until one of the running queries finishes and frees up resources.

Option A is incorrect. If the warehouse was suspended and AUTO_RESUME = FALSE, the query would fail with an error, not queue.

Option B is incorrect. If the user lacked USAGE privileges, the query would fail immediately with a permissions error, not queue.

Option C is incorrect because the prompt explicitly states the warehouse is single-cluster. Scaling policies like ‘Economy’ mode are only relevant for multi-cluster warehouses.

Q198. An administrator wants to measure the “health” or “effectiveness” of a clustering key on the TRANSACTIONS table without re-running the clustering process. Which system function provides a metric for how well-clustered a table is?

A) SYSTEM$CLUSTERING_INFORMATION
B) SYSTEM$AUTOMATIC_CLUSTERING_HISTORY
C) SYSTEM$CLUSTERING_DEPTH
D) GET_DDL(‘table’, ‘TRANSACTIONS’)

Answer: C

Explanation: Option C is the correct answer. The SYSTEM$CLUSTERING_DEPTH function is a lightweight, metadata-based function that calculates the average “depth” of a table based on its clustering key. The depth represents the average number of overlapping micro-partitions that Snowflake might need to scan for a query on the clustering key. A low number (e.g., 1.0) is ideal, indicating perfect clustering. A high number (e.g., 100.0) indicates poor clustering. This function is the primary tool for measuring cluster health.

Option A, SYSTEM$CLUSTERING_INFORMATION, is a related and more comprehensive function. It returns a JSON object that includes the clustering depth, but also provides other details. SYSTEM$CLUSTERING_DEPTH is the specific function that just returns the depth metric itself.

Option B, SYSTEM$AUTOMATIC_CLUSTERING_HISTORY, is a function that shows the history of the Automatic Clustering service, including how many credits were consumed and how much data was re-clustered. It shows the cost of clustering, not its effectiveness.

Option D, GET_DDL, simply returns the CREATE TABLE statement and shows that a clustering key is defined, but provides no metric on how well the data is physically organized.

Q199. A data scientist needs to load a large (5 million row) Pandas DataFrame named df into a new Snowflake table named MY_TABLE. Which Python Connector function is the most performant and efficient way to accomplish this?

A) Looping through the DataFrame and executing INSERT for each row.
B) Using the pandas.to_sql() function with a standard SQLAlchemy engine.
C) Using the snowflake.connector.pandas_tools.write_pandas() function.
D) Converting the DataFrame to CSV, saving it locally, using PUT, and then COPY INTO.

Answer: C

Explanation: Option C is the correct answer. The snowflake.connector.pandas_tools.write_pandas() function is a purpose-built, highly-optimized utility within the Snowflake Python Connector. It is designed specifically for this task. Under the hood, it intelligently serializes the in-memory DataFrame (e.g., to high-performance Parquet format), streams it to an internal stage, and then executes a bulk COPY INTO command to load the data, all in one efficient operation. This provides the best performance.

Option A is the absolute worst performance. Looping and executing 5 million INSERT statements would involve 5 million network round-trips and would be unacceptably slow.

Option B, the generic pandas.to_sql() function, is not optimized for Snowflake and typically defaults to slow, row-by-row INSERTs or INSERT … VALUES (…) batches, which are far less efficient than Snowflake’s COPY command.

Option D describes the manual process that write_pandas automates. While this would be faster than Option A or B, it is more complex, requires local disk I/O, and is less efficient than the streamlined, in-memory process that write_pandas provides.

Q200. A junior DBA accidentally runs DROP SCHEMA PUBLIC; on a production database. The database’s data retention period is 10 days. The PUBLIC schema contained 50 tables. What is the fastest and most complete way to recover the schema and all 50 tables it contained?

A) Restore the tables one by one using UNDROP TABLE <table_name>.
B) Run the command UNDROP SCHEMA PUBLIC;.
C) Contact Snowflake Support to restore the schema from Fail-safe.
D) Re-create the schema and all 50 tables from the last backup file.

Answer: B

Explanation: Option B is the correct answer. Snowflake’s Time Travel feature applies not just to data (DML) but also to database objects (DDL). When a schema is dropped, it (and all the objects it contained) are retained for the data retention period. The UNDROP SCHEMA command is a single, instantaneous, metadata-only operation that will restore the schema and all 50 tables it contained, exactly as they were at the moment before the DROP command was issued.

Option A is incorrect. You cannot UNDROP the tables because their parent schema (PUBLIC) no longer exists. You would have to re-create the schema first, and even then, UNDROP is the faster way.

Option C is incorrect. This is not a Fail-safe operation. Fail-safe is a last-resort recovery (for data, not all objects) after the Time Travel window has expired. Since the drop just happened, the schema is still in Time Travel.

Option D is the “legacy” way of thinking from traditional database systems. Snowflake’s Time Travel feature is designed to make this slow, manual process of restoring from backups obsolete.

Exam

Related posts:

Leave a Reply Cancel reply