Managing the Data Lake - The Critical Importance of An Information Catalog

With so much new data being captured across the enterprise and multiple self-service and data science initiatives being undertaken, something has to know and track what’s going on and what’s available in an increasingly complex data landscape. At the same time, people need the ability to publish what data and what artefacts (ETL jobs, data preparation jobs, analytical models, dashboards, etc) currently exist to encourage re-use and prevent re-invention. This session shows how information catalogue software can be used to publish data and artefacts to manage and organise a multi-platform analytical environment.

This session will cover:

What is an information catalogue?
Information catalogue capabilities, e.g. automatic data profiling, automatic tagging and data classification, automatic data indexing, faceted search, data marketplaces, artefact publishing
Information Catalog technology offerings
How does a in Information catalogue help govern a data lake?
Creating a governed information value chain using an information catalogue
Key roles and responsibilities – Information producers, information consumers and governance
Publishing data and analytics as a service
Integrating disparate metadata via Open Metadata and Governance
Integrating the catalog with data management, data science, and BI technologies
Consumer trust – Accessing business glossaries and metadata lineage

Managing the Data Lake – The Critical Importance of An Information Catalog