The interest in the meaning of data is increasing. Data lineage – the traceability of data to its meaning and the reason for which the data is used – is becoming a critical success factor. Additionally, the increasing variety of data calls for a grip on the individual data sources. The lack of available data specialists makes it necessary to make available knowledge explicit. The introduction of a distributed data architecture provides the final push to “clean up the attic of data”.
The processing of data is therefore not only a logistical challenge, but also requires a reliable approach to map the meaning of data. This approach goes beyond the traditional description of the structure of the data warehouse: a semantic approach is required.
This semantic approach takes the problem space as the starting point for the description: the domain for which data is acquired. An accurate analysis and model of the domain is the basis for a translation to a model of the data itself as it manifests in the solution space. The result can be seen as a knowledge graph: a network of connected (linked) data, including the definition of this data and the lineage to the basis for this data in legislation, compliance guidelines and company definitions.
Such an approach is not only relevant for the data warehouse: the result is an explicit, unambiguous recording of the knowledge about the relevant data in an organization. Marco Brattinga takes you into the world of enterprise semantic data management through the following topics:
- The relevance of semantics for the data warehouse
- The knowledge graph: linking data by and with metadata
- The problem space versus the solution space
- Semantic modeling and data lineage
- The importance of an augmented data catalog
- Best-practices to implement data lineage.