Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1

Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1

I explained what SCD means in a Business Intelligence solution in my previous post. We also discussed that while we do not expect to handle SCD2 in a Power BI implementation, we can handle scenarios similar to SCD1. In this post, I explain how to do so.

Scenario

We have a retail company selling products. The company releases the list of products in Excel format, including list price and dealer price, every year. The product list is released on the first day of July when the financial year starts. We have to implement a Power BI solution that keeps the latest product data to analyse the sales transactions. The following image shows the Product list for 2013:

Products List 2013

So each year, we receive a similar Excel file to the above image. The files are stored on a SharePoint Online site.

Scenario Explained

As the previous post explains, an SCD1 always keeps the current data by updating the old data with the new data. So an ETL process reads the data from the source, identifies the existing data in the destination table, inserts the new rows to the destination, updates the existing rows, and deletes the removed rows.

Here is why our scenario is similar to SCD1, with one exception:

  • We do not actually update the data in the Excel files and do not create an ETL process to read the data from the Excel files, identify the changes and apply the changes to an intermediary Excel file
  • We must read the data from the source Excel files, keep the latest data while filtering out the old ones and load the data into the data model.

As you see, while we are taking a very different implementation approach, the results are very similar with an exception: we do not delete any rows.

Implementation

Here is what we should do to achieve the goal:

  • We get the data in Power Query Editor using the SharePoint Folder connector
  • We combite the files
  • We use the ProductNumber column to identify the duplicated products
  • We use the Reporting Date column to identify the latest dates
  • We only keep the latest rows

Getting Data from SharePoint Online Folder

As we get the data from multiple files stored on SharePoint Online, we have to use the SharePoint Folder connector. Follow these steps:

  1. Login to SharePoint Online and navigate to the site holding the Product list Excel files and copy the site URL from the browser
Getting SharePoint Online Site URL
  1. From the Get Data in the Power BI Desktop, select the SharePoint Folder connector
  2. Click Connect
Connecting to SharePoint Online Folder from Power BI
  1. Paste the Site URL copied on step 1
  2. Click OK
Connecting to SharePoint Online Folder from Power BI using the SharePoint Folder connector
  1. Click Transform Data
Transforming data in Power Query Editor
Continue reading “Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1”