Member-only story
Snowflake: Choosing The Best Clustering Key

Snowflake takes care of how the data will be distributed across micro-partitions and it is done automatically as we populate tables with data. In general, this process will produce well-clustered objects.
However overtime, as DML statements occurs and the data change happens especially on large tables, the data might no longer cluster optimally on desired dimensions which will result in table’s performance degradation.
For this reason, Snowflake introduced Clustered Tables, these are essentially, tables with clustering key. The clustering key can be any column/s of a table that will be explicitly designated as a basis of how the data will be distributed across micro-partitions.
In this article, we will focus on steps and areas that we need to look at in choosing the right table to be clustered and the right column as clustering key.
Note: The steps provided below will include queries. Adjustment on these queries may be required as it may not be suitable for all database and warehouse architecture, please modify or create your own queries as needed according to your design.