Notes from Mythbusting The Modern Data Catalog Webinar

You can watch the video here.

The webinar discusses four myths of data catalog. These myths are:

A data catalog is either for governance or analytics, not both: False
Data governance is the process and procedures of managing the availability, accessibility, integrity and security of the data in enterprise systems. This process and procedures are governed by a set of standards and policies. 

The speaker argues that data governance is important in generating business value. For example, in a survey conducted by 451 Research, 72% of respondents completely or mostly agree that data governance is an enabler of of business value in their organisations, rather than a cost. However, it is hard to measure the value generated by introducing data governance. 

Data analysts spend a lot of time looking for data: True
A big part of this claim is that data analysts spend a lot of time finding data and prepping it for analytics. This process is a huge time sink and reduces the analysts’ productivity level, which is a loss for the organisations. Another survey from 451 Research among 519 respondents find that data analysts spend a mean of 48% of their time finding and preparing data. 

Data catalog will solve data silo automatically: Depends
Data silo happens when only one group within an organisation can access source of data. 451 Research finds that for organisations with more than 1000 employees, 33% have more than 50 departmental data silos

Finally, there can only be a single data catalog to “rule them all”: Depends
This claim is based on the assumption that a data catalog’s usefulness is proportional to the number of data sources it is connected to. An implication of this is that data catalog then can connect to everything will enable the users to have a perfect view of data in the organisation. However, this depends what are the use cases and who are the users of the data catalog.

The key takeaway from the webinar is that each organisation has its own process and culture. A good datalog should support these processes rather than working against it. Although this webinar has some interesting discussions and evidence, I find it lacking as it only covers the surface of data catalog and its procedures. The speaker did not go into the specifics on solving these claims. 

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s