Member-only story

Snowflake: Choosing The Best Clustering Key

Clark Perucho
5 min readSep 29, 2021

--

Photo by Alex Block

Snowflake takes care of how the data will be distributed across micro-partitions and it is done automatically as we populate tables with data. In general, this process will produce well-clustered objects.

However overtime, as DML statements occurs and the data change happens especially on large tables, the data might no longer cluster optimally on desired dimensions which will result in table’s performance degradation.

For this reason, Snowflake introduced Clustered Tables, these are essentially, tables with clustering key. The clustering key can be any column/s of a table that will be explicitly designated as a basis of how the data will be distributed across micro-partitions.

In this article, we will focus on steps and areas that we need to look at in choosing the right table to be clustered and the right column as clustering key.

Note: The steps provided below will include queries. Adjustment on these queries may be required as it may not be suitable for all database and warehouse architecture, please modify or create your own queries as needed according to your design.

Candidate Table Identification

1. Table size

--

--

Clark Perucho
Clark Perucho

Written by Clark Perucho

Snowflake Data Superhero | Certified SnowPro: Administrator | Teradata 14 Certified Technical Specialist

No responses yet

Write a response