Personal Insights in Tech

Tag: Power BI Report Server

Using PnP.PowerShell to retrieve SharePoint Online List Data

The Problem

Working (at this time) with Power BI Report Server, my team and I occasionally encounter hurdles with getting data to automatically refresh from certain sources. In this case, we were unable to directly connect and refresh data from our organizations SharePoint Online site list. In this post I detail how I use the PnP.PowerShell module to retrieve SharePoint Online site list data.

The Solution

After a few developmental trials, I crafted the following PowerShell Script (sample) which utilizes the local logged-in User credentials and PnP.PowerShell module to easily retrieve data from SharePoint Online site lists. As long as the local user has access to the site list in question, and the module is installed on the machine, developers can simply paste this script into PowerShell ISE on their own machine, and configure to their own site list(s) and data column needs.

The script upon execution briefly pops up with a dialog passing local credentials to the SharePoint site in question, and then shows a progress bar as data is exported and transformed, closing when finished:

Progress bar of data export progress from SharePoint Online site list data.

This script can thus be run on command to refresh the data as often as needed, or saved to a specified location and set up in Task Scheduler to run automatically:

Automatically run script each business day at 8:30AM.
Settings in Task Scheduler to run the specified .ps1 PowerShell Script.

Future Plans

My team and I plan to pursue changing this operation to occur through development of an SQL Server Integration Services (SSIS) package, but at this time there are security concerns with utilizing Visual Studio 2019 / .NET 5 (out of support), and Visual Studio 2022 has not yet added development features for SSIS – an irksome problem for myself and many other developers since October 2021; more details on this here.

Conclusion

This post has covered in detail how I use the PnP.PowerShell module to retrieve SharePoint Online site list data. To see other PowerShell scripts I have developed, see my other PowerShell blog posts, or my PowerShell Operations GitHub repository. Thanks for reading!

SCRIPT:

# Script:        Get SharePoint Online List Data
# Developer:     James Scurlock

#
# Description:   Ran from Developer machine in PowerShell ISE, uses local credentials to access a SharePoint Online Site,
#                and retrieve data in a given List for export to .csv/.xlsx. Intention is for this data to be regularly pulled to refresh
#                a Power BI Report based upon said data.
#
# Last Modified: 7/6/2022
#--------------------------------------------------
# PRE-REQUISITE(S): 
#--------------------------------------------------
#    1. Have Access to the SharePoint Site List in question
#    2. Install-Module -Name PnP.PowerShell
#--------------------------------------------------
# DEVELOPER NOTE(S):
#--------------------------------------------------
#    PnP PowerShell Documentation: https://pnp.github.io/powershell/cmdlets/Connect-PnPOnline.html#interactive-login-for-multi-factor-authentication
#--------------------------------------------------
# CONFIGURATION PARAMETER(S):
#--------------------------------------------------
$siteURL = "https://{yourDomain}.sharepoint.com/sites/{nameOfSite}"
$listName = "{nameOfList}"
$exportFileLocation = "\\fileServer\ITGroup\DBTeam\SharePointExportedData\"
$CSVPath = $exportFileLocation + "Exported Data.csv"
#$ExcelPath = $exportFileLocation + "Exported Data.xlsx"

# EDIT on Line 62 as needed for fields to be renamed upon Export.
$propertiesToExport = @("ID","Title","field_1","field_2","field_3"); 
#--------------------------------------------------
# BEGIN SCRIPT
#--------------------------------------------------
try 
{
	# Check if Module is installed on current machine
    if (Get-Module -ListAvailable -Name PnP.PowerShell) {
        $dataForExport = @()
        cls
    
        # Connect-PnPOnline: https://pnp.github.io/powershell/cmdlets/Connect-PnPOnline.html
        # INFO - The following prompts for credentials, but Fails with either 403 Forbidden or non-matching name/password in Microsoft account system. Will not work if Multi-Factor Authentication (MFA) is enabled.
        #    Connect-PnPOnline -Url $SiteURL -Credentials (Get-Credential) # 
        # INFO - Non-Admin: The following returns "The sign-in name or password does not match one in the Microsoft account system - assuming also not working due to Multi-Factor Authentication.
        #        Admin: AADSTS53000: Device is not in required device state: compliant. Conditional Access policy requires a compliant device, and the device is not compliant. The user must enroll their device with an approved MDM provider like Intune.
        #    $creds = (New-Object System.Management.Automation.PSCredential "<<UserName>>",(ConvertTo-SecureString "<<Password>>" -AsPlainText -Force))
        #    Connect-PnPOnline -Url $siteURL -Credentials $creds
        # Ex. 9 - Creates a brief popup window that checks for login cookies. Can use -ForceAuthentication to reset and force a new login.
        #         NOTE: This is required for systems utilizing Multi-Factor Authentication.
        Connect-PnPOnline -Url $siteURL -UseWebLogin 

        $counter = 0
   
        #$ListItems = Get-PnPListItem -List $ListName -Fields $SelectedFields -PageSize 2000 # PageSize: The number of items to retrieve per page request
        #$ListItems = (Get-PnPListItem -List $ListName -Fields $SelectedFields).FieldValues
        $listItemIDTitleGUID = Get-PnPListItem -List $listName #  = Id, Title, GUID

        foreach ($item in $listItemIDTitleGUID)
        {
            $counter++
            Write-Progress -PercentComplete ($counter / $($listItemIDTitleGUID.Count)  * 100) -Activity "Exporting List Items..." -Status  "Exporting Item $counter of $($listItemIDTitleGUID.Count)"

            # Get all list items specificed properties by ID using PnP cmdlet - https://pnp.github.io/powershell/cmdlets/Get-PnPListItem.html
            $listItems = (Get-PnPListItem -List $listName -Id $item.Id -Fields $propertiesToExport).FieldValues
    
            # Add the Name / Value pairing of each property for export to the export data array. Adding and removing some properties by name to specify the actual field designation upon Export.
            $listItems | foreach {
                $itemForExport = New-Object PSObject
                foreach ($field in $propertiesToExport) {
                    if ($field -eq "field_1") {
                        $itemForExport | Add-Member -MemberType NoteProperty -name "field_1" -value $_[$field]
                        $itemForExport | Add-Member -MemberType NoteProperty -Name "ProjectSponsor" -Value $itemForExport.field_1 -Force
                        $itemForExport.PSObject.Properties.Remove("field_1")
                    } else {
                        if ($field -eq "field_2") {
                            $itemForExport | Add-Member -MemberType NoteProperty -name "field_2" -value $_[$field]
                            $itemForExport | Add-Member -MemberType NoteProperty -Name "InitiativeKey" -Value $itemForExport.field_2 -Force
                            $itemForExport.PSObject.Properties.Remove("field_2")
                        } else {
                            if ($field -eq "field_3") {
                                $itemForExport | Add-Member -MemberType NoteProperty -name "field_3" -value $_[$field]
                                $itemForExport | Add-Member -MemberType NoteProperty -Name "AreaOfFocus" -Value $itemForExport.field_3 -Force
                                $itemForExport.PSObject.Properties.Remove("field_3")
                            } else {
									$itemForExport | Add-Member -MemberType NoteProperty -name $field -value $_[$field]
								
							}
						}
					}
				}
		
                # Add the object with the above updated properties to the export Array
                $dataForExport += $itemForExport
            }
        }

        #Export the result Array to csv/xlsx files
        $dataForExport | Export-CSV $CSVPath -NoTypeInformation
			#$dataForExport | Export-Excel -Path $ExcelPath

        # Disconnect the current connection and clear token cache.
        Disconnect-PnPOnline
    } else {
        cls
        Write-Host "In order to execute this script, the Cmdlet Module {PnP.PowerShell} must be installed. Please open an Admin shell and run the following command: Install-Module -Name PnP.PowerShell"
    }
} catch {
    Write-Host $_.Exception.Message
    $Error.Clear()
}
#--------------------------------------------------
# END SCRIPT
#--------------------------------------------------

Data Modeling best practices in Power BI

Summary: Today I experienced a good case study proving the benefits and importance of utilizing data modeling best practices, such as a star schema, in Power BI. Also, how this aides in relation to report size, refresh time, and performance; even on a rather simplistic source query.

Below I detail the original situation, and how much improvement is seen when switching to a star schema. Various segments have been redacted for privacy and security, and should not affect reader ability to follow.

This write-up assumes you have some familiarity with the concept of a Star Schema in Power BI utilizing Fact and Dimension Tables, and the Power BI Desktop tool.


Situation

Utilizing newly learned techniques from various training courses and resources, my team has been striving to review our existing Power BI Reports for improvement.

One point of focus for us are a few reports which seem to have disproportionately long daily refresh times. We note these issues in a custom Admin report we have designed to give us metrics on each report over time:

Admin Report, examining refresh times for various Power BI Reports.

Focusing on our report in question, we see this report is averaging over the last 2 weeks approximately 15-16 minutes to refresh data. This seems excessive on the surface, given how rather simple the source query is.

Examining our “Long Refreshing Report” in particular, we see it averages 15 minutes to refresh.

Before we begin making any changes, note the current file size is 10.8 MB. We do not concern ourselves with the visual elements (3 slicers and a 2-layer matrix), as we are focused on the data model itself and data load times.

The [redacted] query below is our singular source query in this report, and the related Applied Steps in the Power BI data model, which currently take 15-16min for the report to refresh source data:

Various applied steps in our query.
Redacted overview of our existing main query.

Running the query in Oracle SQL Developer, we see there are approximately 2.3 million rows in this result set:

Row count of original query.

Updates

Following normal procedure for changing the data model into that of a star schema, we modify our original source query  to:

  • Remove duplicate, unnecessary columns like [code_value] on lines 2 and 17.
  • Trim case logic down on [Fund] lines 5-8, to just get the [fund_id], which we will link to a dimension table for retrieving those appropriate values.
  • Trim case logic down on [classification_type] lines 19-22 for just the [classification_id], linking to another dimension table for the full text.
     
  • Note: In our case, it was not necessary for this data to be individual-specific and include unique personal id’s. We grouped the data up to a summary level from the source SQL directly instead of relying on Power BI to do this for us in visuals. We also modified our WHERE statement check on the date from using a specific date that is greater than equal a static value, to always returning 2 years’ worth of data as observed from January 1st of the current year; including up to the current date.
Old way of date checking the data.
Updated way of checking the date for results.

All these changes together result in the below main query structure. These changes make it cleaner, and extrapolate dimensional data that we’ll need out into other queries/tables. Otherwise, these dimension values must be replicated in each and every one of our fact table rows.

Updates made to source query simplify the query and reduce result set.11

These changes take us from 2,329,236 rows in our main fact table query, down to just the 4,024 rows worth of data we need.

This updated source query becomes our Fact Table, factTable1, now with only a singular applied step to change the DateTime field to a Date:

Applied Steps needed after updating our source query.

We then create Dimension tables for the:

  • Funds (dimTable1) containing 2 columns and 2 rows.
  • Source types (dimTable2) containing 16 rows and 4 columns.
  • A custom defined Date table (dimTable3), containing the [full_date], [year], [month_long], [month_num] and [fiscal_year] for every date we return in our main query for the fact table. These fields are pulled from a re-usable and easily edited View we maintain in our database.

Shown below, we keep this dimension table filtered using List.Max() and List.Min(). In this manner, we keep our date table

Advanced Editor steps showing how the updated query for Fact Table is arranged.

After closing and applying our Query changes, there are 2 things to keep in mind with this approach:

  1. First, we mark our Date dimension table explicitly as a {Date Table} on the Data Modeling tab in Power BI.

    This lets Power BI know this table is to be used in relation to all date fields.
  2. Remember to wire up all your Dimension tables to your Fact table:
New data model splitting out Fact Table and Dimension Tables.

From here, we simply update our visuals to pull values from newly created dimension table fields.

Conclusion

With visuals now updated, we save the report and trigger a one-time refresh on the power bi report server.

New refresh time after making query and data model changes is far reduced.

And there we have it!

With these minor changes we see vast improvement, including easier maintainability and faster reload from 16 minutes to 1 minute. The report is also now only taking up 124kb of space on our report server as opposed to 10.8MB. Thanks for reading, and hope this helps someone else in the future like it has me.

Powered by WordPress & Theme by Anders Norén