Reducing Publish time to Power BI Service

A Tip or Trick

I’ve been working on a Power BI file in Power BI Desktop and it’s gotten a fair amount of data in it. Each time I published the file to the Power BI Service, the time to complete the publishing was getting larger and larger. When it got past 20 minutes to publish, I started looking for ways to speed this up.

One of the data sources that Power BI can consume is another .pbix file that already exists in the Power BI Service. So, I separated my reports and data into 2 different .pbix files and referenced data the data model as a source from the reports .pbix file.

Publishing each .pbix file follows the same flow as any publish and the way to find the data model takes 2 steps. The first is to choose Get Data and Power BI Service:

Next you will see a dialog box showing workspaces in your service. Pick the workspace and correct source file and you’re done.

This has reduced my time of publishing from 20 minutes (or more) to seconds.

Pushing row data from Treasure Data to Azure SQL

I wanted to push data from Treasure Data into Azure SQL. The data was row based and roughly about 1 TB in size.

As I examined my options, there were quite a few, however, in the end, it turned out that the easy options were just not enough. I’ll walk through the options I considered and my thoughts on each.

Pushing from TD

  • Push from Treasure Data using MS Sql Connector. The good part of this option is that it’s built into Treasure Data and supports things such as only push updates. As I looked into this option, I was hoping it would work as this would make my job a lot easier. However, as I pushed data it turned out that pushing data to Azure SQL continually timed out. So after contacting TD support (who are *very* responsive), I kept reducing the array size to a point where the data transfer was functional, but the data throughput was so poor, moving the amount of data I wanted was a non-starter. On to option 2.
  • The second option I tried was also from the TD console. I pushed data from TD into Blob storage at Azure. This went fairly smoothly, but when I wanted to rename the file I was placing in Azure, I was not able to do so. On to option 3.
  • The third option I tried was to write a script using the TD command line. This gave me all the flexibility I wanted for naming the file as well as adding parameters so that I could move other tables in the future. Since I was starting a TD job, all the built in monitoring and logging for TD also kicked in. Also, all the compression & security for transport just worked. Nice.
    So, that was it pushing the data out of TD. Now to look at pulling the data in Azure SQL. In the architecture diagram, this covers steps 1 and 2.

Import into Azure SQL

  • On the Azure side, I wanted to notice whenever a new file showed up in blob storage. This turned out to be a nice fit for Azure Functions. As I looked into Azure Functions, there were two versions. One is in GA now (v1) and the second in preview (v2). As I looked into them, I realized my requirements were pretty basic, so I took a chance on the preview version (v2). Turns out writing an Azure functions in C# is really easy. I wrote 2 functions, one to uncompress the incoming file and the second to load the .csv file into Azure SQL. This item covers steps 3 and 4 of the architecture diagram.
  • Turns out there’s a new shiny on the horizon — Azure Data Factory (currently in preview). I suspect I’ll rewrite the Azure functions into Azure Data Factory as it moves from preview into GA. The main reasons in my mind are improved monitoring and alerting.

Npm and Node.js behind proxy

If you are behind a proxy, you’ll need to tell Node about the proxy. Normally, I would expect setting HTTP_PROXY and HTTPS_PROXY environment variables to do the trick. However, in the case of node, the commands you’re after are:

npm config set proxy <HTTP_PROXY>
npm config set https-proxy <HTTPS_PROXY>

Also, you can put these commands into your .npmrc file.