Tag Archives: Big Data

Good Data is a Virtue

One of my favorite sayings with respect to data quality is ‘Garbage In, Garbage Out’. Data needs to be respected yet many organizations are apathetic around the need to control and manage data quality. It typically falls to certain administration functions to manage but often they lack the understanding of the base information.

And those same organizations are first to complain about bad data.

Quality is not an act, it is a habit

– Aristotle

Bad Habit Practices

Practice 1: Information is power. Let’s maintain our own data silo’s – information is leveraged with those that need to know. “It’s the way senior management like it!”

Practice 2: Role Perception: “I am too busy to waste time in administrating data entry. The procedure is not user friendly”; and there is little incentive, or penalty, to enter data completely.

Practice 3: Data capture importance: “It’s not my problem, someone else will check it and clean it”. This is also further complicated as many organizations are constantly changing – there is a lack of consistency, the goal posts are always moving!

Practice 4: Poor data transparency means it is easier to hide true performance. It suits to keep it opaque as “we are more likely to keep our jobs”.

Cost of Quality

The cost of quality rule (1:10:100) illustrates how the cost of error builds up exponentially as it moves down the value chain. This rule states the cost increases by a factor of 10 if an error remains undetected in each stage of the chain. For example, to remedy a quality issue when captured at the start of a manufacturing process costs $1; however if that same issue was to remain undetected at the end of the manufacturing process and go on to impact an external customer, the cost of quality failure would be $100. The learning is that prevention is better than the cure: prevention is less costly than correction, and less costly than failure.

Despite this rule, it seems some organizations pursue workaround strategies. These strategies unfortunately do not address the root cause issues, paper over the cracks, and end up being more costly than a direct, target source data strategy.

Correction Approaches

Challenge 1: Data Analysis, such as spend analytics solutions, will isolate bad data, and only map good data. Aside from being after the event, the challenge: Who trains the solution to know what good and bad is, and what happens with the bad data? Additionally if inputs are inconsistent and highly variable, this becomes a never ending high touch exercise, and any gaps will cause the entire data set to be questioned.

Challenge 2: Send all the data to be screened; audited and cleansed. As with Challenge #1, this suffers from the same limitations as well as throwing up potential delays. Do you have an army to administrate this? Probably not – a more robust approach is required!

Whilst these workarounds can compliment an organization’s capability in controlling and managing data, a better way is to initiate good data at the start.

Strategy Tips to achieve ‘right first time’ data

  1. Create user accountability – bad data and poor workmanship is a result of a cultural habit that disregards {good} data significance. Leaders must champion data quality.
  2. Join up data sources electronically. Ensure you have a single source of truth for the respective datasets.
  3. Standardize the data terminology, format and follow a logical hierarchy.
  4. Structure the data to ensure that it works for the different user perspectives. Enrich the data where appropriate for the user, but it must remain connected with the source of truth.
  5. Enter the complete information once. Combine a maniacal attention to detail at the start of the process with the use of templates and checklists. Design solution forms to elicit data entry in the most user friendly and intuitive manner, and avoid having forms that contain irrelevant fields or fields that are blank.
  6. Utilize automated system rule sets to perform stage 1 ‘checks and balances’. Prevent the garbage entry possibility!
  7. Reduce the temptation to have multiple approvals to validate the data. This approach has a poor success rate (as well as delaying the process). See item #1.
  8. Employ users that ‘get it’ – void those that do not. Good and bad data cannot be mixed – this corrupts the entire data set.

Data integrity is more than good data. It is about establishing processes that control and manage the data to ensure that it is accurate, consistent, complete and timely. With the emergence of big data, getting these fundamentals right will be critical.

We are all accountable for good data. To err is human, but to really foul things up requires a computer. Contact Us.