The Power Query
Editor in Power BI has many transformations that can be set from the UI. Keep
Top Rows is one of them. I'd like to take you on my journey through this
curious case I ran into a couple of weeks back when preparing for my session on
Query
Folding in Power BI.
Transformations in Power Query |
Problem
I was preparing my
demo's for my session at dataMinds
Connect on Query Folding in Power BI (slides). If you're not
familiar with query folding: transformations in Power Query are transformed
into the native (data source) language and executed at the source for better
performance. For a more thorough overview have a look at the post How Query Folding
Works from Matt Allington (B | @ExceleratorBI).
If we look at SQL
Server, some transformations can be folded (like Sorting, a simple Group By and
Keep Top Rows), others cannot (merging two columns from the UI, using the
Buffer() M-function and Keep Top Rows). See what I did there? :) Keep Top Rows
on its own is a foldable transformation, it is (obviously) transformed into the
TOP operator in SQL.
However, when using
Keep Top Rows in combination with some other transformations it (looks like it)
is not folding anymore. Your best chance is to use the View Native Query option in Power Query SQL Server
Profiler to check if the query is folding or not. More on that later :)
A bit of contemplation: I do think the real
use case of the Keep Top Rows transformation might be mostly in development
environments, where you can limit the number of rows to import for example. I
still think the findings were rather odd and worth sharing. So, let's have a
look at how it works.
Setup
I used import mode
for the Fact.Movement table from the WideWorldImportersDW
database. I then added the Keep Top (100) Rows transformation in the Power
Query Editor.
Keep Top Rows |
As you can see it
folds to SQL and all is good for now.
View Native Query with TOP 100 |
Adding transformations
Let's add a transformation on the Quantity column and apply an absolute value to it.
Add Absolute value of Quantity |
So now the option to
View Native Query is gone. Let's see if Power Query is speaking the truth
(spoiler: No) 😀
I started SQL Server Profiler and captured the queries running against the database after I Close
and Apply my changes. As you can see, now both the TOP operator and ABS function
are folded.
SQL Server Profiler: TOP 100 and ABS |
Now let's try
something different and add a filter on the Date Key. Something strange has
happened, because I already filtered the first 100 rows, so only 2 dates
remained after that. But when I click on the
Calculated Absolute Value query step I suddenly have a lot more dates
that shouldn't be there in my opinion.
Filter on Date Key |
Let's add a field
from the Customer dimension this time and see what happens.
Expand Customer Dimension |
The query for
Fact.Movement is still the same, so the TOP operator and ABS function are still
folding. And this is what is sent to the database for the Customer dimension:
SQL Server Profiler: Customer Dimension with WHERE clause |
The Curious Case
Now watch closely to
the where clause: so it actually already filters out the Customer Key based on
the steps performed earlier in the Query Editor. But this Customer dimension is a separate query
that returns 1 row, which will be joined to the other query inside Power Query.
Back to that WHERE clause: that Customer Key = 0 is valid for my (broken) Calculated Absolute Value query step, not for the Kept First 100 Rows as you see in the picture
above. So I actually get the wrong
a different result back into Power Query than I would expect.
I see you're
thinking: "But you didn't add an explicit order, right?" So the top
100 is a little arbitrary. That's true, so let's add that order by.
Add Sorted Rows |
This is a slightly
better result in my opinion. Because now the Power Query steps are in sync with
each other. The issue with View Native Query still remains the same by the way.
It stops working at the Calculated Absolute
Value query step, but the query send to the database is the following,
so including the ABS function:
View Native Query after Sort |
Recap
I showed several
variations in the Power Query Editor of using Keep Top Rows and showed that
adding subsequent steps brakes the View
Native Query option in the editor, although query folding sometimes does take
place. And adding an order to your query before applying a TOP operator is
always a good idea, not only in SQL.
The steps I used might not all be best practices when creating a data model in Power BI. In case of the join to the customer dimension for example,
it might be better to model it as a separate dimension, because STAR SCHEMA ALL
THE THINGS. But I used those steps to highlight the problem.
Trying to make the STAR SCHEMA ALL THE THINGS into a meme. https://t.co/D7n6EfwuIS— Koen Verbeeck (@Ko_Ver) October 24, 2019
I already mentioned this
No comments:
Post a Comment
Thanks for taking the time to provide feedback!
Cheers,
Nicky