Managing the Data Lake – The Critical Importance of An Information Catalog

With so much new data being captured across the enterprise and multiple self-service and data science initiatives being undertaken, something has to know and track what’s going on and what’s available in an increasingly complex data landscape.  At the same time, people need the ability to publish what data and what artefacts (ETL jobs, data preparation jobs, analytical models, dashboards, etc) currently exist to encourage re-use and prevent re-invention.  This session shows how information catalogue software can be used to publish data and artefacts to manage and organise a multi-platform analytical environment. 

This session will cover:

  • What is an information catalogue?
  • Information catalogue capabilities, e.g. automatic data profiling, automatic tagging and data classification, automatic data indexing, faceted search, data marketplaces, artefact publishing
  • Information Catalog technology offerings
  • How does a in Information catalogue help govern a data lake?
  • Creating a governed information value chain using an information catalogue
  • Key roles and responsibilities – Information producers, information consumers and governance
  • Publishing data and analytics as a service
  • Integrating disparate metadata via Open Metadata and Governance
  • Integrating the catalog with data management, data science, and BI technologies
  • Consumer trust – Accessing business glossaries and metadata lineage