Making Snowflake Faster Why Understanding Micro Partitions Really Matters
- Hannah Dowse
- Mar 16
- 3 min read
If you’ve ever wondered why some Snowflake queries feel lightning fast while others seem… less fun, Will Riley’s session shines a big spotlight on the reason. And the good news is that a lot of the magic comes down to things you can influence with a few smart decisions.
Will has been with Snowflake for years and has seen the platform grow from “cloud data warehouse” into “all things AI.” But as he puts it, the fundamentals of storing and retrieving data efficiently haven’t changed. And once you understand how Snowflake handles micro partitions, clustering and pruning, you suddenly have a whole lot more control over performance and cost.
Let’s walk through the key ideas in a clear, conversational way.
Snowflake’s Secret Weapon Micro Partitions
Everything in Snowflake is stored as micro partitions. Think of them as small, compressed files about 16 MB each. They’re:
Automatic
Immutable
Columnar
Versioned
That immutability is what gives you features like time travel. Every time you update or delete something, Snowflake doesn’t rewrite old files it creates new ones and tracks versions for you.
And here’s the big bit. Every micro partition also stores metadata describing what’s inside it things like min and max values per column.
This means when you run a query, Snowflake doesn’t just blindly scan all your data. It uses that metadata to figure out which micro partitions don’t need reading.
That’s pruning. And it’s where performance is won or lost.
Why the Order You Load Data In Changes Everything
Snowflake automatically tries to keep similar data together when you load it. So if your data naturally arrives in date order for example, IoT feeds or clickstream data you get beautiful natural clustering without lifting a finger.
But if you load data in a random order Snowflake has no choice but to scatter values across micro partitions and suddenly pruning becomes much harder.
Will’s demo made this painfully clear.
Take a simple table with a billion rows.
When loaded randomly, a query filtering on one date took around 25 seconds. When the same table was rebuilt in date order the exact same query took about 3 seconds.
Same data. Same query. Just loaded in a different order.
That’s an 8x speed up. And an 8x reduction in compute cost.
All because the micro partitions now had clean date ranges instead of jumbled ones.
Clustering When Natural Ordering Isn’t Enough
Sometimes your users filter on more than one column or your ingest pattern just isn’t naturally ordered.
That’s where clustering keys come in.
Clustering tells Snowflake which columns matter most for pruning. The goal is to get your “average depth” low ideally 1 meaning Snowflake only needs to scan one micro partition to find the values it needs.
But Will makes an important point. You shouldn’t cluster blindly.
Check which predicates people actually use. Check their selectivity. Cluster only on columns that meaningfully reduce the number of partitions scanned.
And absolutely don’t use clustering to fix poor data types. If your dates are stored as strings you’ll lose pruning no matter what.
Updating Data Can Break Clustering
Because micro partitions are immutable any update creates a new one. On big tables this can create churn and slowly destroy natural clustering over time.
That’s why insert only patterns Snowflake and Data Vault style are so effective. They avoid reversioning tons of micro partitions and keep everything nicely organized.
A Few Practical Tips From Will
Will wrapped the session with some simple but powerful takeaways:
Know your workloads and your filter patterns
Check natural clustering before adding cluster keys
Prefer insert only and avoid unnecessary updates
Use proper data types for dates and numbers
Rebuild a table in sorted order if it’s heavily used the speedup is often worth the small rebuild cost
Don’t just scale up the warehouse fix the data layout first
These aren’t complicated tricks but they make an enormous difference.
The Big Lesson
Snowflake is fast out of the box but it’s much faster when your micro partitions work with you instead of against you. By understanding how data is stored and how pruning works you can deliver dramatic performance improvements without throwing more compute at the problem.


