Dirty data is dirty oil!

In the age of big data and AI, companies are re-discovering the value of their data. British mathematician Clive Humby proclaimed that data is the new oil.

But companies must come to terms with the quality of the data they have. Most insurers have disparate systems which were not designed with a vision how data can be worked together to yield analytic insights.

Yes, data is the new oil, but it is not, if the new oil is dirty!

While most IT departments do enforce data quality checks in their databases, those checks are typically rule based. It is primarily designed to look for violations of data definitions. For example, if it is a numeric filed, it shouldn’t have text input. Or if it is an interest rate, it shouldn’t, normally, be greater than 100% or less than 0%. Even so, it is not possible to anticipate every possible type of violations.

Worse yet, there is one type of data errors that is virtually undetected in these detection methods – logical inconsistency. One example of logical inconsistency is a claim for cervical cancer from a male policyholder. While male and cervical cancer separately are legitimate in its data definition, putting them together presents a logical inconsistency that is quite likely caused by data entry errors. Logical inconsistency between data is exceedingly hard to detect.

This type of data problems inevitably biases the analysis because the analysts cannot control for it, unlike the more conventional data error types, which can always be excluded if needed in the analysis since its existence is known.

Relacio possesses unique ability to detect logical inconsistency errors in the data. Deep sweep on the data allows us to spot those errors. With this capability, combined with conventional data error checking, Relacio provides a good handle on how good the data really is.

Companies may be surprised by how bad their data error problems are. In one case, we estimated that 10% of the claim data has data error problems, as measured in terms of claims amount. In another case, as much as 65% have data errors in it.

Accumulating large amount of dirty oil doesn’t do much good. Data is absolutely a competitive advantage. With Relacio, meaningful, automatic and real-time data quality monitoring is doable.

In the age of big data and AI, companies are re-discovering the value of their data. British mathematician Clive Humby proclaimed that data is the new oil.

But companies must come to terms with the quality of the data they have. Most insurers have disparate systems which were not designed with a vision how data can be worked together to yield analytic insights.

Yes, data is the new oil, but it is not, if the new oil is dirty!

While most IT departments do enforce data quality checks in their databases, those checks are typically rule based. It is primarily designed to look for violations of data definitions. For example, if it is a numeric filed, it shouldn’t have text input. Or if it is an interest rate, it shouldn’t, normally, be greater than 100% or less than 0%. Even so, it is not possible to anticipate every possible type of violations.

Worse yet, there is one type of data errors that is virtually undetected in these detection methods – logical inconsistency. One example of logical inconsistency is a claim for cervical cancer from a male policyholder. While male and cervical cancer separately are legitimate in its data definition, putting them together presents a logical inconsistency that is quite likely caused by data entry errors. Logical inconsistency between data is exceedingly hard to detect.

This type of data problems inevitably biases the analysis because the analysts cannot control for it, unlike the more conventional data error types, which can always be excluded if needed in the analysis since its existence is known.

Relacio possesses unique ability to detect logical inconsistency errors in the data. Deep sweep on the data allows us to spot those errors. With this capability, combined with conventional data error checking, Relacio provides a good handle on how good the data really is.

Companies may be surprised by how bad their data error problems are. In one case, we estimated that 10% of the claim data has data error problems, as measured in terms of claims amount. In another case, as much as 65% have data errors in it.

Accumulating large amount of dirty oil doesn’t do much good. Data is absolutely a competitive advantage. With Relacio, meaningful, automatic and real-time data quality monitoring is doable.

More Articles