We’ve all seen studies that showed the enormous amounts of data that are created on this planet every day. However, a large part of this data is not new but copied data. In existing data architectures, such as data warehouses, a lot of copying is taking place. But modern architectures, such as data lakes and data hubs, also rely heavily on copying data. This rampant copying must be reduced. We don’t always think about it, but copying data has many disadvantages, including higher data latency, complex forms of data synchronization, more complex data security and data privacy, higher development and maintenance costs, and degraded data quality. It is time to apply the data minimization principle when designing new data architectures. This means that the aim is to minimize copied data. In other words, users gain more access to original data and move from data-by-delivery to data-on-demand. The latter corresponds to what has happened in the movie industry: from collecting videos at a store to video-on-demand. In short, data minimization means that we are going to ‘Netflix’ our data.
- The effect of data minimization on data warehouses, data lakes, and data hubs
- The network becomes the database
- Use of translytical databases, analytical databases, and data virtualization to apply data minimization
- Focus on business rules and not on data storage
- Examples of applying data minimization to existing data architectures.