top of page

Making Snowflake Faster Why Understanding Micro Partitions Really Matters

  • Hannah Dowse
  • Mar 16
  • 3 min read

If you’ve ever wondered why some Snowflake queries feel lightning fast while others seem… less fun, Will Riley’s session shines a big spotlight on the reason. And the good news is that a lot of the magic comes down to things you can influence with a few smart decisions.


Will has been with Snowflake for years and has seen the platform grow from “cloud data warehouse” into “all things AI.” But as he puts it, the fundamentals of storing and retrieving data efficiently haven’t changed. And once you understand how Snowflake handles micro partitions, clustering and pruning, you suddenly have a whole lot more control over performance and cost.


Let’s walk through the key ideas in a clear, conversational way.


Snowflake’s Secret Weapon Micro Partitions

Everything in Snowflake is stored as micro partitions. Think of them as small, compressed files about 16 MB each. They’re:


  • Automatic

  • Immutable

  • Columnar

  • Versioned


That immutability is what gives you features like time travel. Every time you update or delete something, Snowflake doesn’t rewrite old files it creates new ones and tracks versions for you.


And here’s the big bit. Every micro partition also stores metadata describing what’s inside it things like min and max values per column.


This means when you run a query, Snowflake doesn’t just blindly scan all your data. It uses that metadata to figure out which micro partitions don’t need reading.


That’s pruning. And it’s where performance is won or lost.


Why the Order You Load Data In Changes Everything

Snowflake automatically tries to keep similar data together when you load it. So if your data naturally arrives in date order for example, IoT feeds or clickstream data you get beautiful natural clustering without lifting a finger.


But if you load data in a random order Snowflake has no choice but to scatter values across micro partitions and suddenly pruning becomes much harder.

Will’s demo made this painfully clear.


Take a simple table with a billion rows.


When loaded randomly, a query filtering on one date took around 25 seconds. When the same table was rebuilt in date order the exact same query took about 3 seconds.


Same data. Same query. Just loaded in a different order.


That’s an 8x speed up. And an 8x reduction in compute cost.


All because the micro partitions now had clean date ranges instead of jumbled ones.


Clustering When Natural Ordering Isn’t Enough

Sometimes your users filter on more than one column or your ingest pattern just isn’t naturally ordered.


That’s where clustering keys come in.


Clustering tells Snowflake which columns matter most for pruning. The goal is to get your “average depth” low ideally 1 meaning Snowflake only needs to scan one micro partition to find the values it needs.


But Will makes an important point. You shouldn’t cluster blindly.


Check which predicates people actually use. Check their selectivity. Cluster only on columns that meaningfully reduce the number of partitions scanned.


And absolutely don’t use clustering to fix poor data types. If your dates are stored as strings you’ll lose pruning no matter what.


Updating Data Can Break Clustering

Because micro partitions are immutable any update creates a new one. On big tables this can create churn and slowly destroy natural clustering over time.

That’s why insert only patterns Snowflake and Data Vault style are so effective. They avoid reversioning tons of micro partitions and keep everything nicely organized.


A Few Practical Tips From Will

Will wrapped the session with some simple but powerful takeaways:


  • Know your workloads and your filter patterns

  • Check natural clustering before adding cluster keys

  • Prefer insert only and avoid unnecessary updates

  • Use proper data types for dates and numbers

  • Rebuild a table in sorted order if it’s heavily used the speedup is often worth the small rebuild cost

  • Don’t just scale up the warehouse fix the data layout first


These aren’t complicated tricks but they make an enormous difference.


The Big Lesson

Snowflake is fast out of the box but it’s much faster when your micro partitions work with you instead of against you. By understanding how data is stored and how pruning works you can deliver dramatic performance improvements without throwing more compute at the problem.

 
 
bottom of page