Data Products – From Design, to Build, to Publishing and Consumption [English spoken]

Most companies today are storing data and running applications in a hybrid multi-cloud environment. Analytical systems tend to be centralised and siloed like data warehouses and data marts for BI, cloud storage data lakes for data science and stand-alone streaming analytical systems for real-time analysis. These centralised systems rely on data engineers and data scientists working within each silo to ingest data from many different sources and engineer it for use in a specific analytical system or machine learning models. There are many issues with this centralised, siloed approach including multiple tools to prepare and integrate data, reinvention of data integration pipelines in each silo and centralised data engineering with poor understanding of source data unable to keep pace with business demands for new data.

To address these issues, a new approach called Data Mesh emerged in late 2019 attempting to accelerate creation of data for use in multiple analytical workloads. Data Mesh is a decentralised business domain-oriented approach to data ownership and data engineering to create a mesh of reusable data products that can be created once and shared across multiple analytical systems and workloads.

This half-day workshop looks at the development of data products in detail and also, how can you use a data marketplace to share and govern the sharing of data products across the enterprise to shorten time to value.

Learning Objectives:

  • Strengths and weaknesses of centralised data architectures used in analytics
  • The problems caused in existing analytical systems by a hybrid, multi-cloud data landscape
  • The emergence of data mesh and data products
  • What exactly a data product is and the types of data products that you can create
  • The benefits that data products offer and what are the implementation options?
  • How to organise to create data products in a decentralised environment so you avoid chaos?
  • How business glossaries can help ensure data products are formally defined, understood by business users and semantically linked
  • The critical importance of a data catalog in understanding what data is available
  • What software is required to build, operate and govern a data mesh of data products for use in a data lake, a data lakehouse or data warehouse?
  • What is data fabric software, how does it integrate with data catalogs and connect to data in your data estate
  • An Implementation methodology to produce ready-made, trusted, reusable data products
  • Collaborative domain-oriented development of modular and distributed DataOps pipelines to create data products
  • How a data catalog and automation software can be used to generate DataOps pipelines
  • Managing data quality, privacy, access security, versioning, and the lifecycle of data products
  • Publishing semantically linked data products in a data marketplace for others to consume and use
  • Governing the sharing and use of data products in a data marketplace
  • Consuming data products in an MDM system
  • Consuming and assembling data products in multiple analytical systems like data warehouses, lakehouses and graph databases to shorten time to value.

 

Who is it for?
This seminar is intended for business data analysts, data architects, chief data officers, master data management professionals, data scientists, IT ETL developers, and data governance professionals. It assumes you understand basic data management principles and data architecture plus a reasonable understanding of data cleansing, data integration, data catalogs, data lakes and data governance.

 

Detailed course outline
Most companies today are storing data and running applications in a hybrid multi-cloud environment. Analytical systems tend to be centralised and siloed like data warehouses and data marts for BI, cloud storage data lakes or Hadoop for data science and stand-alone streaming analytical systems for real-time analysis. These centralised systems rely on data engineers and data scientists working within each silo to ingest data from many different sources, clean and integrate it for use in a specific analytical system or machine learning models. There are many issues with this centralised, siloed approach including multiple tools to prepare and integrate data, reinvention of data integration pipelines in each silo and centralised data engineering with poor understanding of source data unable to keep pace with business demands for new data. Also, master data is not well managed.

To address these issues, a new approach emerged in late 2019 attempting to accelerate creation of data for use in multiple analytical workloads. That approach is Data Mesh. Data Mesh is a decentralised business domain-oriented approach to data ownership and data engineering to create a mesh of reusable data products that can be created once and shared across multiple analytical systems and workloads. A Data Mesh can be implemented in a number of ways. These include using one or more cloud storage accounts on cloud storage, on an organised data lake, on a Lakehouse, on a data cloud, using Kafka or using data virtualisation. Data products can then be consumed in other pipelines for use in streaming analytics, Data Warehouses or Lakehouse Gold Tables, for use in business intelligence, feature stores for use data science, graph databases for use in graph analysis and other analytical workloads.

This half-day workshop looks at the development of data products in detail. It also looks at the strengths and weaknesses of data mesh implementation options for data product development. Which architecture is best to implement this? How do you co-ordinate multiple domain-oriented teams and use common data infrastructure software like Data Fabric to create high-quality, compliant, reusable, data products in a Data Mesh. Is there a methodology for creating data products? Also, how can you use a data marketplace to share and govern the sharing of data products? The objective is to shorten time to value while also ensuring that data is correctly governed and engineered in a decentralised environment. It also looks at the organisational implications of Data Mesh and how to create sharable data products for use as master data, in a data warehouse, in data science, in graph analysis and in real-time streaming analytics to drive business value? Technologies discussed includes data catalogs, data fabric for collaborative development of data integration pipelines to create data products, DataOps to speed up the process, data orchestration automation, data observability and data marketplaces.

  • What are data products?
  • What makes creating data products different from other approaches to creating data for use analytical workloads?
  • A best practice methodology for creating data products
  • How to design semantically linked data products to enable rapid consumption and use of data to produce new insights
  • Quick start mechanisms to speed up data product design
  • Defining common business data names for data products in a business glossary
  • Data modelling techniques for data products
  • Discovering data needed to build data products using a data catalog
  • Developing DataOps pipelines to engineer the data needed using data fabric
  • Publishing data products – the role of the data marketplace
  • Governing access to and use of data products across the enterprise
  • Consuming and assembling data products for use in multiple analytical workloads
  • Technologies and skills needed.