This blog post is following on my previous posts using community tools to improve your data model. Previously we used Tabular Editor: here and here. Now let’s define what RI (referential integrity) violations are? This is how it is described in the official documentation:
The blank virtual rows are effectively Unknown Members. Unknown members represent referential integrity violations where the “many” side value has no corresponding “one” side value. Ideally these blanks should not exist, and they can be eliminated by cleansing or repairing the source data.
https://docs.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-understand#regular-relationships
There are two types of violations:
Type 1 – Fact table missing key value
One reason for referential integrity problems if you miss the id from the fact table.
Type 2 – Dimension missing key value
The other reason for referential integrity problems if you miss the id from the dimension table.
Result:
In both cases a blank row will appear in a visual, aggregating all the rows that cannot be matched to any of the dimension values.
Why this can be a problem?
In terms of performance for simple and small models this won`t really influence performance. But in large, complex, or direct query data models can cause problems.
It is the most obvious in DQ. If you have referential integrity violations than the joins in the query PBI generates will need to be outer join instead of inner join. Therefore, there is an option in relationships to assume referential integrity because that have substantial performance effect.
Even if you are in import mode these issues can have an effect on your performance, although less obvious.
DAX Studio – Data model metrics:
You can use DAX Studio to uncover RI violations.
How to solve Referential Integrity violations?
The reason we separated the violations into two types is that the solution for these is different.
Type 1 – Fact table missing key value
- First always check if the data is valid if the missing value is acceptable. If not fix at the source.
- If missing values is valid, create an empty/undefined row in your dimension and map your missing fact rows to that.
Type 2 – Dimension missing key value
If you have values in your fact table that are absent in your dimension table than it most likely means that there is an issue with the logic you prepared your data or the data source itself.
There can be few explanations why you have this:
- data value in the fact table was entered incorrectly and that id never existed
- id got deleted and there was not foreign key constraint on the tables
- only active ids were loaded in
So, the solution is to correct the data preparation to make sure all the ids are added to the dimension tables or invalid facts are removed or corrected.
There is an excellent blog post by Phil Seamark about this:
Resources:
- Dax Studio: https://daxstudio.org/
- Model Relationships Documentation: https://docs.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-understand
Summary
If your data is perfect, as it usually is ?, you do need to care about this. But if you are among those unlucky few who must work imperfect data than this can mean the difference between usable and unusable performance.