Combine Files
Combine Files

Getting Files (Folders) from SharePoint or OneDrive into Dataflows

With the introduction of Dataflows in Power BI, the platform’s capabilities have increased quite a bit.

(Note: I’ll do a detailed overview of Dataflows in another post.)

Dataflows became generally available not long ago, but have not yet reached feature parity to Power Query in the Desktop.

One really wildly used source that is missing is the ability to connect to folders. Especially SharePoint/OneDrive folders.

Fortunately, it turns out that it is mostly a UI issue, not a lack of functionality.

In order to join Excel files, you can use a technique that I learned a while back when the Folder connector in Desktop was not able to handle Excel sources.

This technique will require 2 queries.

The first will be a sample query that connects to one the files. We will turn it into a function and our second query will invoke this function on our selected list of files.

It will be a similar logic to connect to CSVs too.

Let`s see the solution:

For this task, first you need your OneDrive/SharePoint site URL

The URL looks like this:

OneDrive URL
OneDrive URL

Then in Dataflows, create a blank query.

Paste in this code:

let
   Query = (Name as text, Folder as text, Data as text, Type as text) =>
 let
   Source = SharePoint.Files("your-site-url", [ApiVersion = 15]),
   Navigation = Source{[Name = Name, #"Folder Path" = Folder]}[Content],
   #"Imported Excel" = Excel.Workbook(Navigation, null, true),
   #"Navigation 1" = #"Imported Excel"{[Item = Data, Kind = Type]}[Data]
 in
   #"Navigation 1"
 in
   Query

This will be our connector to an Excel file.

Change the “your-site-url” part to your actual site url.

Name this function GetDataFromExcel

This is connecting to a sheet or table in an Excel file.

You will see that the invoke function settings appear with 4 parameters.

Function Parameters
Function Parameters

You can try them out. This is what the input parameters should look like:

Let’s create our second query

I recommend you to create this part in the Desktop first to test if it works and then paste the code in Dataflows in a new query.

The first step is to create a blank query and copy the previous code.

Then create a second blank query.

Then right click on the new Query and select Advanced Editor.

let
     Source = SharePoint.Files("your-site-url", [ApiVersion = 15])
in
     Source 

Change “your-site-url” to your actual site url.

You will be prompted to sign in.

Then search for the files you want to combine.

Pro Tip: try to make sure that your filter will produce the right results in the future, so be as specific as you can be. Like set the filter on an attribute that currently won`t change the results but might solve problems in the future. For example: If filtering to “.xlsx” type already gave you the right list, that might not be the case in the future when other Excel files gets uploaded, so add a name or folder filter also.

Once you are filtered down to the needed files, use Choose columns to keep the Name and Folder Path.

Select columns
Select columns

Once we have all the needed parameters for the function, we can invoke it.

Create a custom column and add this code:

GetDataFromExcel(Text.From([Folder Path]),Text.From([Name]), "Table1", "Table")

Then use the expand button to get to the data.

Expand
Expand

After deleting the unnecessary columns and setting the data type, your M query should look like this:

let
   Source = SharePoint.Files("your-site-url", [ApiVersion = 15]),
   #"Filtered rows" = Table.SelectRows(Source, each Text.Contains([Name], "Sales")),
   #"Filtered rows 1" = Table.SelectRows(#"Filtered rows", each ([Extension] = ".xlsx")),
   #"Remove other columns" = Table.SelectColumns(#"Filtered rows 1", {"Name", "Folder Path"}),
   #"Invoked Custom Function" = Table.AddColumn(#"Remove other columns", "GetDataFromExcel", each GetDataFromExcel(Text.From([Name]), Text.From([Folder Path]), "Table1", "Table")),
   #"Expanded GetDataFromExcel" = Table.ExpandTableColumn(#"Invoked Custom Function", "GetDataFromExcel", {"Cart ID", "Date", "Sales", "Quantity", "Discount", "Profit", "Shipping Cost", "Payment Type", "Product ID", "Shop_id", "CustomerID"}, {"Cart ID", "Date", "Sales", "Quantity", "Discount", "Profit", "Shipping Cost", "Payment Type", "Product ID", "Shop_id", "CustomerID"}),
   #"Removed columns" = Table.RemoveColumns(#"Expanded GetDataFromExcel", {"Folder Path", "Name"}),
   #"Changed column type" = Table.TransformColumnTypes(#"Removed columns", {{"Cart ID", type text}, {"Date", type date}, {"Sales", type number}, {"Quantity", Int64.Type}, {"Discount", type number}, {"Profit", type number}, {"Shipping Cost", type number}, {"Payment Type", type text}, {"Product ID", Int64.Type}, {"Shop_id", type text}, {"CustomerID", type text}})
 in
   #"Changed column type"

Now you can paste it in Dataflows.

Make sure you turn off the load for the function. Now you can Save & Close.

Validating Queries
Validating Queries

Validation might take a while.

After it is done you should start a refresh and it is ready to be used.

Hope you like this blog post, it turned out to be quite long.

Please follow and like us:

About the author

Mihaly Kavasi

Data-driven Decision Making Enthusiast.
Power BI Expert and Trainer. Helping organizations and communities to use data effectively.
Big fan of RE and Electric cars.
Working @ AltiusData

View all posts