Skip to content

A Jump Start to Power BI Dataflows

A Jump Start to Power BI Dataflows

Let me say some words about my Jump Start Blog Series. This “Jump Starts” Articles should give you a short and practical overview about Power BI Key Topics. This means I do not write to much text. Sometimes some background knowledge about MS Data Platform in general is necessary. Sometimes not, please leave a Comment if something is unclear. The point of view in this blog is never the single Power BI Desktop as a single Application, it is always the whole Data platform including Azure Services. So more from a strategy point of view. The content will be either automatically translated to English or German.

Demystifying Power BI Dataflows…

Power BI Dataflows are just “Power Query” in the Web, but…     

! KEY take away !

  • Data is stored in Azure Data Lake Gen 2 which makes it accessible for other Services like Databricks, Data Factory and others as well
  • According to Chris Webb, you can set relationships between the Dataflows (Entities) of the Data Lake (Defining Relationships Between Entities In The Common Data Model To Automatically Create Relationships In Power BI)
  • You can invoke Cognitive Services inside the Dataflows like Image Tagging and Text Analysis. Not Available for “pro” Users as far as I know.
  • You can map the Dataflows into the Common Data Model of Power Apps CDM
  • Copy and Paste your Power Query Script from Excel or Power BI Desktop to the Dataflows, however not all Data Sources are supported yet but will be added. LINK
  • In Power BI Premium also incremental Refresh is supported => out of the box, however there are already work around’s for “pro” users available.
  • As well linked and computed Entities are Premium only.
  • It is not Azure Data Factory (Data Flows) nor Microsoft Flow
  • Roadmap: Dataflows will be functionally equivalent to Power Query in Power BI Desktop

Introduction to Dataflows & Self-Service BI

  • Power BI Dataflows entities in purpose like tables in a DWH / Data Mart. The Data is stored in an Azure Data Lake Gen 2 Storage. This is the main differentiator to Power Query in Power BI Desktop.
  • However, due to many factors (including underlying storage in files in Azure Data Lake), it generally does not make sense to consider dataflows as a replacement for a data warehouse.
  • Dataflows do not replace data warehouses and ETLs, nor does Power BI Desktop replace the MS Data platform (SQL Database, Azure Data Factory, Databricks, …). Dataflows complement these professional tools and allow non-BI professionals to close more gaps in a BI solution.
  • BIG Data platforms such as Azure Data Lake Storage enable “modern data warehouse” scenarios that were previously unavailable and make them more accessible. (Dataset, hardware) The need for a “traditional” Kimball-style data warehouse has not diminished, and I think it’s unlikely to disappear soon.
  • The synergy between Power BI Dataflows and Azure Data Factory (as well as other Azure data services) is made possible by CDM folders in Azure Data Lake Storage gen2. The ADLS can be the new data hub.
  • Defining relationships between entities in the CDM folder in Azure Data Lake results in relationships being created between tables in a Power BI dataset
  • The main reason for placing Power BI Dataflows over or alongside a professional ETL tool such as Azure Data Factory, SSIS, Informatica or Oracle ODI is the user. A business user or analyst can create Power BI dataflows but can not use an ETL tool. Both worlds offer similar solutions to similar challenges.
  • There are going to be people who say you do not need a data warehouse, if you have dataflows – that’s wrong. This is as wrong as saying that self-service BI does not have to manage it.
  • Dataflows can be part of the solution, or they can be part of the problem.
  • Other Azure services like Azure ML, Databricks, Azure Data Factory can access CDM folders.

Dataflow Architecture

Power BI Dataflows entities in purpose like tables in a DWH / Data Mart. The Data is stored in an Azure Data Lake Gen 2 Storage. This is the main differentiator to Power Query in Power BI Desktop.

Enterprise Data Lake / CDM Common Data Model

The Common Data Model can be accessed by many other services. You can decide to use it or not. Classifying and assigning your data to the CDM the data can have many advantages.

First Conclusion

  • It takes a holistic data strategy in the company
  • Dataflows do not replace an enterprise DWH or master data management tool.
  • You must choose the right tool and implement it in the right way. You need to evaluate the tools based on your needs based on their capabilities rather than selling or marketing pitches.
  • Power BI Dataflow = Self Service ETL = Power Query (but,)
  • Power BI Dataflows can be used for data profiling
  • Root data that everyone in the company should use can be made available
  • Shared and certified datasets are coming
  • April 2019 released
  • Go one step further than Dataflows
  • Provides “linked” data models
  • Dataflows are just “tables”
  • Implement dataflows as part of a managed and managed self-service BI concept to fill in gaps. It makes no sense to fully load existing Enterprise Data Warehouse environments into a flow-based Data Lake

Possible Usage Scenarios

Scenario 1:

Existing Data Warehouse & Power BI

If you have already a Data Management Architecture in Place then you have basically 3 Options or a combination of them to connect to your existing data landscape.

  1. Direct Query & Live Connect to SQL Server or some other DBs like Snowflake DWH, Teradata, SAP Hana, Analysis Services
  2. Import, over 80 Data Sources
  3. Dual (Import & Direct Query)

Scenario 2:

Existing Data Warehouse + Self Service ETL

If you want to extend your organizational self-service BI capabilities and also manage it in the same time, which is very important, then this scenario might fit for you. Provide Dataflows derived from your Data Warehouse or other applications for re-use in multiple Power BI data sets. We can talk here for example about a Master data hub, providing all important Master data sets to your organisation, like Chart of Accounts, Product catalogue, Time dimension and so on. The same you can do for some Fact data sets, like sales, forecast or weather data. Simply also for multiple re-use in Power BI Data Sets or even other Dataflows. Set User Access Rights to manage this piece of your architecture. I presented the master data hub concept for the first time on 30 November 2018 in a webinar: https://dataviz.boutique/events/event/learning-webinar-power-bi-architecture/

Scenario 3:

Greenfield + Self Service ETL

The point is here, if you start greenfield, better do everything in the cloud if possible. Take your “LEGO® bricks” you need from the Microsoft Data Platform. All work more or less together.

Demo Dataflows

As described earlier, if you have Master Data which many Departments in your Organisation wants to use, then you can think of providing them with Power BI Dataflows to your End Users, you can simply call it a Master Data Hub, maybe derived from different Sources, like your Enterprise Data warehouse or from your Master Data Management tool or any other application like a CRM.

A Master data hub, providing all important Master data sets to your organisation, like Chart of Accounts, Product catalogue, Time dimension and so on.

The same you can do for some Fact data sets, like sales, forecast or weather data. Simply also for multiple re-use in Power Bi Data Sets or even other Dataflows. Set User Access Rights to manage this piece of your architecture.

Linked Entities

Cognitive Services

Dataflows Security

Many approaches possible, one could be:

  1. Create a Workspace for Master data (Chart of Accounts, Entity, Time, Cost center.)
  2. Create a Workspace for each set of Transactional Data (HR, Finance, Production,)
  3. Link Dataflows to avoid Data duplication
  4. Give Read Access to that User Group which should be allowed to consume Dataflows.

Roadmap

Dataflows will be functionally equivalent to Power Query in Power BI Desktop

Views: 919