Jutta Tynkkynen

Jutta Tynkkynen

Data Specialist

The Data Vault can be many things, you could even use it to help solve crimes. But what is this DV, or the more evolved DV2.0 specification, exactly?

The Data Vault 

To be short, DV is a way to model huge data warehouses and a way to load data into it fast. The possibility to load huge amount of data fast and get it into reports efficiently is something that companies, which struggle with their data, could appreciate.

The way to accomplish this is the idea of Data Vault where you first define the key concepts of the business and how they are linked to each other. Based on this, the data vault is made to store the data, all the data, so nothing is left out and the history of the data is kept.

Thus all the raw data from the different sources is stored in the data vault and can be extracted from it into data marts (information marts), which are made specifically for reporting.

In addition to this, DV is also a way to trace the data back to its source, so that all data can be audited. This means one can tell from every single row in the database where it comes from.

And most importantly, a data vault is scalable. This is important because it enables the adding of more sources and more huge amounts of data and tables into the data vault, without having to change the existing database. And without having to compromise the efficiency of the data vault.

But how does DV do it then?

The basic structure of DV consists of three key elements: the hubs, the satellites and the links.

The hub is the business concept, for example a criminal, and contains its unique business keys (for example a unique criminal number). The hub (as well as the other elements) may also contain a surrogate key to link it to the other structures in the model, the load date and the record source for auditing.

The links are the relationships between the objects (the list of relationships between two or more business keys). For example the associations between criminals and crimes. This way the many-to-many relationships can be modeled, and the database can store for example the information about what crimes the criminals have done.

The satellites contain the descriptive information (i.e. the attributes that may change over time) about the hubs or links. For example the height, hair color, eye color or modus operandi (M.O.) of the criminal (M.O. is the known way the criminal usually does his or her crimes).

By using this basic but very intuitive structure, the raw data coming from the different sources can be stored in the database. The loading itself is done with an insert only –method which is usually faster than update-command. In addition, the DV enables the use of hash keys to speed up the data storing and retrieving. Thus, the data vault can be scaled with efficiency.

 

Meanwhile, at the Edmonton Police Service…

The Data Vault architecture was chosen by the Edmonton Police as their Data Warehouse to have “a centrally managed source of truth and the ability to provide time-based analysis”, as told by Liam Hicks, the Team Lead of the Data Management Services of the Edmonton Police Service.

The individual source operational systems did not offer the Edmonton Police the means to keep track of changes over time or any kind of analytics reporting. However, with the help of the DV that they took in 2010, they have reached the objectives they were aiming for. With the data vault, they now have, they can have the kind of analytics and reporting they prefer. 

And as Liam Hicks states, the benefits of using DV are vast: “Extensibility, ease of adopting new business needs without incurring a lot of maintenance overhead, and adoption of standardized load patterns which allows for improved reusability.”

Conclusion

The Data Vault’s possibility for joining enormous amounts of data from different sources very fast can be very helpful, even for crime fighters. And thus possibly enabling the IT-system developers to solve the real mystery, the way to find an efficient way of mastering huge amounts of data, and from all kinds of different sources.

Thanks to Liam Hicks for the interview!

More information about DV can be found from the datavaultalliance.com, which is a web page community for its members to collaborate on Data Vault solutions.

 

Kirjoittajasta

Jutta Tynkkynen

Jutta Tynkkynen

Data Specialist

Jutta työskentelee CGI:llä tietovarastojen ETL-ratkaisujen ja raportoinnin parissa. Hän on ollut CGI:llä mukana määrittelemässä, suunnittelemassa, toteuttamassa ja ylläpitämässä useita erilaisia räätälöityjä Data Warehouse –tietovarastoja, jotka tukevat asiakkaiden liiketoimintaa.