Some technically simple features can deliver value to organisations if used properly. Dataset certification is, in essence, a labelling system to improve visibility of quality data sources. Here is one way to make the most of this feature for data in democratic data-driven companies.
This article is about understanding the implications of dataset certification and the conscious decision on the gold-standard that those certified datasets should meet.
Power BI, being the large-scale, wide-scope, versatile and truly empowering business analytics platform that it is, comes with a myriad of features and capabilities. Some of them with complex and very smart engines under the hoods, and some others quite simple in concept and implementation.
The latter can also deliver interesting value to the organisation, which is the final objective: it is not about how cool some tools or features are, it is about the benefit. However, that added value and business impact depend on how these “less sophisticated” features are utilised, implemented and understood.
Dataset Certification is basically, a labelling mechanism in Power BI Service to highlight important (“Certified”) datasets to be used, both, by report authors and by business analysts, either using Power BI or even Excel. That’s it, just a label that identifies what your organisation classifies as “Certified Datasets” and brings them to the top of the list of Power BI datasets that a given user has access to.
As there are already plenty of resources that explain the technical part of “how to” declare a Power BI Dataset “Certified”, I will skip that part in this article. If you are interested in learning the technical “how”, I suggest that you go thought Microsoft’s official documentation for “Certify Datasets – Power BI” and complement it with Adam Saxton´s (Guy in a Cube) enthusiasm in this video about “Shared & Certified datasets in Power BI”.
So, before jumping into rubber-stamping Power BI Dataset (aka Certifying), there are three important things to consider:
1.- There are certain “gold-standards” that you want your organisation’s certified datasets to meet.
2.- There is only one degree of governed certification, so you only get one shot at it.
3.- Because you have only one definition of certification, it is better to get it right from the beginning.
Let us talk about (2) first. There are two different labels for Power BI Datasets: “Promoted” and “Certified”. Whilst the former is self-granted (dataset owners can promote their datasets on their own), the latter is governed by your Power BI Admins, who can delegate the “certification powers” to the dataset certification governing body.
The fact is that Certification is binary, i.e. a dataset is either certified or not. Which brings us to the key question that this article revolves around:
What is the value that I want to deliver to users and the organisation with datasets certification and what benefits need to be very clear to my users when they use Certified Datasets.
This question brings us to the next point.
What users should understand, when they use a Certified Power BI Datasets, is that such dataset is:
- Trusted, by automated, comprehensive, and published DQ checks.
- Resilient and Reliable, by proper change management process and proper support model.
- Versatile, by enabling different use cases and having a feedback mechanism from users.
- Timely, by having relevant data when is needed.
- Targeted, by ensuring that relevant users can exploit it.
- Secure, by limiting the data to what users really need to see.
- Usable, by providing documentation, training, and subject matter expert availability.
- Performant, with insights delivered in seconds or under, not minutes, always.
So, they know immediately that such dataset is best option to base their business analysis on, and it will very likely that will meet the expectations and needs, securely and consistently, now and in the future.
Now, it is up to the organisation, and the dataset certification governing body (which in my opinion is not Power BI Admins, but some form of Data CoE), to define what is the specific criteria to tick all the boxes above. It is also up to such governing body to define the process and periodic review for Certified datasets.
Such criteria need to be exposed as well, to Certified dataset consumers, so they understand how all those gold standards are met, which only increase confidence on these datasets and the process to rubber-stamp them.
We could list potential criteria to be utilised, such as need of Training availability, Subject Matter Expert availability, Automated DQ checks, well-defined change process, well-governed security, different SLAs, business continuity plan, support system… Whilst this list could be a long one, it might not be very different from controls imposed to other well-governed databases in your organisation.
All these looks good on its own, but we always need to ask ourselves: what is the real value derived from this? Or in other words, is this worth doing?
In this case, the values can easily justify the overhead that comes from “certifying datasets”:
- Accelerate data driven insights. As analysts don’t need to spend time finding the relevant data source, extracting and shaping it to their needs, only to find that takes time to implement the business login that is required and the final system is slow, or breakable, or both…
- Accelerates decision making. As insights came faster to the decision makers hands, who also have a lesser need to challenge how those insights have been provided.
- Improves decision making. As it is more likely that insights used are accurate, timely and reliable.
- Reduces Data Silos. As analysts don´t need to figure out how to source their reports, and eventually, build their own that they can trust.
Consistency is key, as we do not want to find different degrees of quality in our certified data sources. All of them need to be equally trusted, resilient, secure, … So, it is important that you start with a perfectly acceptable set of gold standard conditions and rules, that will endorse all those nice attributes that we want ALL our organisation’s certified datasets to meet, therefore it is not wise to change such conditions and rules half way.
Assuming that getting the perfect criteria upfront might be difficult, there are two approaches: start with very high standards or with standards just good enough.
What can happen with the latter is that your standards don’t really ensure all those attributes (trustworthiness, reliability, security,..) and at some point, realise that you need to raise such standards by making your criteria more demanding, and therefore having datasets that were certified before, but today wouldn’t cut it. What do you do then? Demote those suboptimal datasets? Not very fair, as dataset owners might have worked (and spent money and resources) to meet the criteria that is no longer valid. Then, either you keep different degrees of certification, as not all Certified datasets are equally good, or demote them, or give them a grace period to adjust to new criteria. A better approach is to shoot high, take one of the solutions that represent your organisation’s best class datasets and start the definition of such criteria with those in mind, later there is always time to relax the conditions to let other value-adding datasets to fill the ranks of Certified ones and still, all your datasets do meet (revised) high standards.
What amazes me and urged me to write about Power BI Certified datasets, is the real value that can be derived from a feature as simple as a “tagging” system, if it is used wisely.