Data Modeling best practices in Power BI

Summary: Today I experienced a good case study proving the benefits and importance of utilizing data modeling best practices, such as a star schema, in Power BI. Also, how this aides in relation to report size, refresh time, and performance; even on a rather simplistic source query.

Below I detail the original situation, and how much improvement is seen when switching to a star schema. Various segments have been redacted for privacy and security, and should not affect reader ability to follow.

This write-up assumes you have some familiarity with the concept of a Star Schema in Power BI utilizing Fact and Dimension Tables, and the Power BI Desktop tool.

Situation

Utilizing newly learned techniques from various training courses and resources, my team has been striving to review our existing Power BI Reports for improvement.

One point of focus for us are a few reports which seem to have disproportionately long daily refresh times. We note these issues in a custom Admin report we have designed to give us metrics on each report over time:

Admin Report, examining refresh times for various Power BI Reports.

Focusing on our report in question, we see this report is averaging over the last 2 weeks approximately 15-16 minutes to refresh data. This seems excessive on the surface, given how rather simple the source query is.

Examining our “Long Refreshing Report” in particular, we see it averages 15 minutes to refresh.

Before we begin making any changes, note the current file size is 10.8 MB. We do not concern ourselves with the visual elements (3 slicers and a 2-layer matrix), as we are focused on the data model itself and data load times.

The [redacted] query below is our singular source query in this report, and the related Applied Steps in the Power BI data model, which currently take 15-16min for the report to refresh source data:

Redacted overview of our existing main query.

Running the query in Oracle SQL Developer, we see there are approximately 2.3 million rows in this result set:

Updates

Following normal procedure for changing the data model into that of a star schema, we modify our original source query to:

Remove duplicate, unnecessary columns like [code_value] on lines 2 and 17.
Trim case logic down on [Fund] lines 5-8, to just get the [fund_id], which we will link to a dimension table for retrieving those appropriate values.
Trim case logic down on [classification_type] lines 19-22 for just the [classification_id], linking to another dimension table for the full text.
Note: In our case, it was not necessary for this data to be individual-specific and include unique personal id’s. We grouped the data up to a summary level from the source SQL directly instead of relying on Power BI to do this for us in visuals. We also modified our WHERE statement check on the date from using a specific date that is greater than equal a static value, to always returning 2 years’ worth of data as observed from January 1st of the current year; including up to the current date.

Old way of date checking the data.

Updated way of checking the date for results.

All these changes together result in the below main query structure. These changes make it cleaner, and extrapolate dimensional data that we’ll need out into other queries/tables. Otherwise, these dimension values must be replicated in each and every one of our fact table rows.

Updates made to source query simplify the query and reduce result set.11

These changes take us from 2,329,236 rows in our main fact table query, down to just the 4,024 rows worth of data we need.

This updated source query becomes our Fact Table, factTable1, now with only a singular applied step to change the DateTime field to a Date:

Applied Steps needed after updating our source query.

We then create Dimension tables for the:

Funds (dimTable1) containing 2 columns and 2 rows.
Source types (dimTable2) containing 16 rows and 4 columns.
A custom defined Date table (dimTable3), containing the [full_date], [year], [month_long], [month_num] and [fiscal_year] for every date we return in our main query for the fact table. These fields are pulled from a re-usable and easily edited View we maintain in our database.

Shown below, we keep this dimension table filtered using List.Max() and List.Min(). In this manner, we keep our date table

Advanced Editor steps showing how the updated query for Fact Table is arranged.

After closing and applying our Query changes, there are 2 things to keep in mind with this approach:

First, we mark our Date dimension table explicitly as a {Date Table} on the Data Modeling tab in Power BI.

This lets Power BI know this table is to be used in relation to all date fields.
Remember to wire up all your Dimension tables to your Fact table:

New data model splitting out Fact Table and Dimension Tables.

From here, we simply update our visuals to pull values from newly created dimension table fields.

Conclusion

With visuals now updated, we save the report and trigger a one-time refresh on the power bi report server.

New refresh time after making query and data model changes is far reduced.

And there we have it!

With these minor changes we see vast improvement, including easier maintainability and faster reload from 16 minutes to 1 minute. The report is also now only taking up 124kb of space on our report server as opposed to 10.8MB. Thanks for reading, and hope this helps someone else in the future like it has me.

Personal Insights in Tech

Data Modeling best practices in Power BI

Situation

Updates

Conclusion

Related

Data Modeling best practices in Power BI

Situation

Updates

Conclusion

Related

My Tools (2021)

Using PowerShell to retrieve Azure DevOps Work Item Data