Incremental Refresh in Power BI, Part 2; Best Practice; Do NOT Publish Data Model Changes from Power BI Desktop

Incremental Refresh Best Practice, Do NOT Publish Changes from Power BI Desktop

In a previous post, I shared a comprehensive guide on implementing Incremental Data Refresh in Power BI Desktop. We covered essential concepts such as truncation and load versus incremental load, understanding historical and incremental ranges, and the significant benefits of adopting incremental refresh for large tables. If you missed that post, I highly recommend giving it a read to get a solid foundation on the topic.

Now, let’s dive into Part 2 of this series where we will explore tips and tricks for implementing Incremental Data Refresh in more complex scenarios. This blog follows up on the insights provided in the first part, offering a deeper understanding of how Incremental Data Refresh works in Power BI. Whether you’re a seasoned Power BI user or just getting started, this post will provide valuable information on optimising your data refresh strategies. So, let’s begin.

When we publish a Power BI solution from Power BI Desktop to Fabric Service, we upload the data model, queries, reports, and the loaded data into the data model to the cloud. In essence, the Power Query queries, the data model and the loaded data will turn to the Semantic Model and the report will be a new report connected to the semantic model with Connect Live storage mode to the semantic model. If you are not sure what Connect Live means, then check out this post where I explain the differences between Connect Live and Direct Query storage modes.

The Publish process in Power BI Desktop makes absolute sense in the majority of Power BI developments. While Power BI Desktop is the predominant development tool to implement Power BI solutions, the publishing process is still not quite up to the task, especially on more complex scenarios such as having Incremental Data Refresh configured on one or more tables. Here is why.

As explained in this post, publishing the solution into the service for the first time does not create the partitions required for the incremental refresh. The partitions will be created after the first time we refresh the semantic model from the Fabric Service. Imagine the case where we successfully refreshed the semantic model, but we need to modify the solution in Power BI Desktop and republish the changes to the service. That’s where things get more complex than expected. Whenever we republish the new version from Power BI Desktop to Fabric Service, we get a warning that the semantic model exists in the target workspace and that we want to Overwrite it with the new one. In other words, Power BI Desktop currently does not offer to apply the semantic model changes without overwriting the entire model. This means that if we move forward, as the warning message suggests, we replace the existing semantic model and the created partitions with the new one without any partitions. So the new semantic model is now in its very first stage and the partitions of the table(s) with incremental refresh are gone. Of course, the partitions will be created during the next refresh, but this is not efficient and realistically totally unacceptable in production environments. That’s why we MUST NOT use Power BI Desktop for republishing an already published semantic model to avoid overriding the already created tables’ partitions. Now that Power BI Desktop does not support more advanced publishing scenarios such as detecting the existing partitions created by the incremental refresh process, let’s discuss our other options.

Alternatives to Power BI Desktop to Publish Changes to Fabric Service

While we should not publish the changes from Power BI Desktop to the Service, we can still use it as our development tool and publish the changes using third-party tools, thanks to the External Tools support feature. The following subsections explain using two tools that I believe are the best.

Continue reading “Incremental Refresh in Power BI, Part 2; Best Practice; Do NOT Publish Data Model Changes from Power BI Desktop”

Incremental Refresh in Power BI, Part 1: Implementation in Power BI Desktop

Incremental-Refresh-in-Power-BI-Part-1-Implementation-in-Power-BI-Desktop

Incremental refresh, or IR, refers to loading the data incrementally, which has been around in the world of ETL for data warehousing for a long time. Let us discuss incremental refresh (or incremental data loading) in a simple language to better understand how it works.

From a data movement standpoint, there are always two options when we transfer data from location A to location B:

  1. Truncation and load: We transfer the data as a whole from location A to location B. If location B has some data already, we entirely truncate the location B and reload the whole data from location A to B
  2. Incremental load: We transfer the data as a whole from location A to location B just once for the first time. The next time, we only load the data changes from A to B. In this approach, we never truncate B. Instead, we only transfer the data that exists in A but not in B

When we refresh the data in Power BI, we use the first approach, truncation and load, if we have not configured an incremental refresh. In Power BI, the first approach only applies to tables with Import or Dual storage modes. Previously, the Incremental load was available only in the tables with either Import or Dual storage modes. But the new announcement from Microsoft about Hybrid Tables greatly affects how Incremental load works. With the Hybrid Tables, the Incremental load is available on a portion of the table when a specific partition is in Direct Query mode, while the rest of the partitions are in Import storage mode.

Incremental refresh used to be available only on Premium capacities, but from Feb 2020 onwards, it is also available in Power BI Pro with some limitations. However, the Hybrid Tables are currently available on Power BI Premium Capacity and Premium Per User (PPU), not Pro. Let’s hope that Microsft will change its licensing plan for the Hybrid Tables in the future and make it available in Pro.

I will write about Hybrid Tables in a future blog post.

When we successfully configure the incremental refresh policies in Power BI, we always have two ranges of data; the historical range and the incremental range. The historical range includes all data processed in the past, and the incremental range is the current range of data to process. Incremental refresh in Power BI always looks for data changes in the incremental range, not the historical range. Therefore, the incremental refresh will not notice any changes in the historical data. When we talk about the data changes, we are referring to new rows inserted, updated or deleted, however, the incremental refresh detects updated rows as deleting the rows and inserting new rows of data.

Benefits of Incremental Refresh

Configuring incremental refresh is beneficial for large tables with hundreds of millions of rows. The following are some benefits of configuring incremental refresh in Power BI:

  • The data refreshes much faster than when we truncate and load the data as the incremental refresh only refreshes the incremental range
  • The data refresh process is less resource-intensive than refreshing the entire data all the time
  • The data refresh is less expensive and more maintainable than the non-incremental refreshes over large tables
  • The incremental refresh is inevitable when dealing with massive datasets with billions of rows that do not fit into our data model in Power BI Desktop. Remember, Power BI uses in-memory data processing engine; therefore, it is improbable that our local machine can handle importing billions of rows of data into the memory

Now that we understand the basic concepts of the incremental refresh, let us see how it works in Power BI.

Implementing Incremental Refresh Policies with Power BI Desktop

We currently can configure incremental refresh in the Power BI Desktop and in Dataflows contained in a Premium Workspace. This blog post looks at the incremental refresh implementation within the Power BI Desktop.

After successfully implementing the incremental refresh policies with the desktop, we publish the model to Power BI Service. The first data refresh takes longer as we transfer all data from the data source(s) to Power BI Service for the first time. After the first load, all future data refreshes will be incremental.

How to Implement Incremental Refresh

Implementing incremental refresh in Power BI is simple. There are two generic parts of the implementation:

  1. Preparing some prerequisites in Power Query and defining incremental policies in the data model
  2. Publishing the model to Power BI Service and refreshing the dataset

Let’s briefly get to some more details to quickly understand how the implementation works.

  • Preparing Prerequisites in Power Query
    • We require to define two parameters with DateTime data type in Power Query Editor. The names for the two parameters are RangeStart and RangeEnd, which are reserved for defining incremental refresh policies. As you know, Power Query is case-sensitive, so the names of the parameters must be RangeStart and RangeEnd.
    • The next step is to filter the table by a DateTime column using the RangeStart and RangeEnd parameters when the value of the DateTime column is between RangeStart and RangeEnd.

Notes

  • The data type of the parameters must be DateTime
  • The datat tpe of the column we use for incremental refresh must be Int64 (integer) Date or DateTime.Therefore, for scenarios that our table has a smart date key instead of Date or DateTime, we have to convert the RangeStart and RangeEnd parameters to Int64
  • When we filter a table using the RangeStart and RangeEnd parameters, Power BI uses the filter on the DateTime column for creating partitions on the table. So it is important to pay attention to the DateTime ranges when filtering the values so that only one filter condition must have an “equal to” on RangeStart or RangeEnd, not both
Continue reading “Incremental Refresh in Power BI, Part 1: Implementation in Power BI Desktop”

What Does XMLA Endpoints Mean for Power BI and How to Test it for Free?

Test Environment from Power BI XMLA Endpoint

XMLA endpoint connectivity for public preview has been announced late March 2019. As at today, it is only available for Power BI Premium capacity users. This sounds like a massive restriction to a lot of people who don’t have a Premium capacity, but they’d love to see how it works. In this article I show you an easy way to get your hands to Power BI XMLA endpoint as quick as possible. Before I start, I’d like to simply explain what XMLA endpoint is and what it really means for Power BI users.

Power BI is Like Onion! It has layers!

Generally speaking, Power BI has two different layers, presentation layer and data model layer. Presentation layer is the visual layer, the one you make all those compelling reports and visualisations. The data model as the name resembles, is the layer that you make your data model in. This layer is the one you can access it via XMLA connectivity.

In a Power BI Desktop file, you can see both layers:

Different layers of Power BI

How XMLA Relates to Different Layers in Power BI?

As you may have already guessed, XMLA is only related to the data model layer and it has nothing to do with the presentation layer. So you may connect to a data model, browse the data model, import data from the model to other platforms like Excel and so forth.

XMLA Is Not New!

Seriously? Yes, seriously. It is not new. It’s been around for many years and perhaps you’ve already used it zillions of times. Whenever you’re connecting to an instance of SQL Server Analysis Services, either Multidimensional or Tabular from any tools like SQL Server Management Studio (SSMS), Power BI Report Builder, Excel, Tableau, etc…, you’re using XMLA connectivity indeed.

Power BI is an Instance of SSAS Tabular

It is true. Power BI runs a local instance of SSAS Tabular model. So, whenever you open a Power BI Desktop file (PBIX), Power BI creates a local instance of SSAS Tabular model with a random local port number that can be accessed on your local machine only. When you close the file, the local instance of SSAS Tabular is shut down and its port number is released.

I first revealed the fact that you can connect to the underlying data model in Power BI Desktop from whole different range of tools like SSMS, SQL Server Profiler, Excel, etc… on Jun 2016. So, we indeed were using XMLA to connect to Power BI data models for a long time. We can even take a step further to import our Power BI data models into an instance of SSAS Tabular. In that sense, we are literally generating XMLA scripts from Power BI to create the same data model in SSAS Tabular. How cool is that?

Sooo… What is new then?

Continue reading “What Does XMLA Endpoints Mean for Power BI and How to Test it for Free?”

Four Different Ways to Find Your Power BI Desktop Local Port Number

Power BI Desktop Local Port

About two years ago I wrote a blog post on how to connect to a Power BI Desktop model from different tools like SSMS, Excel, and SQL Server Profiler. A big chunk of that blog post was about different ways of finding Power BI Desktop local port. Having the port number is crucial if you want to connect to a Power BI Desktop model from any sort of tool for different purposes like monitoring in SQL Server Profiler or analysing the model in Excel. I’ve been asked several times here and there about different reasons people need to find the Power BI Desktop local port number, so I decided to dedicate a post to the Power BI Desktop local port number.

When you run Power BI Desktop, it runs a local instance of SSAS Tabular model in the background and it assigns a random port number to that local instance. Therefore, if you want to connect to the model to analyse it in Excel you need to know that port number. There are several ways to achieve the port number and this post will cover most of them.

As you may know, Power BI Desktop is now available as a Windows App that can be installed from Windows Store. In general, the main difference between the App and the downloaded edition from the Microsoft website is that you will get a notification whenever a newer version is available to download. Then you have to manually download and install the newer version on your machine, but, the App will be updated automatically. However, some other differences are out of the scope of this article. There is just one difference that is relevant to this article which is that Power BI Desktop creates its temporary folders in a different location than the Power BI Desktop App does.

The methods described in this post work for both the downloaded edition and the Store edition.

Different Methods to Find Power BI Desktop Local Port

· Using DAX Studio

· Tabular Editor

· Power BI Desktop Temporary Location

· Command Line (CMD)

The reason that I explain different methods is that not all methods are doable in all environments. You may work at a customer site that doesn’t allow you to install new software like DAX Studio on their machines. But you always have access on Power BI Desktop temporary files.

Continue reading “Four Different Ways to Find Your Power BI Desktop Local Port Number”