Dashboard Dangers 7: Samples Continued

in #analytics6 years ago

Introduction

If you haven't read the other articles in this series, please check them out:

  1. Dashboard Dangers Part 1
  2. Dashboard Dangers Part 2
  3. Dashboard Dangers Part 3
  4. Dashboard Dangers Part 4
  5. Dashboard Dangers Part 5
  6. Dashboard Dangers Part 6

Samples Continued

A few years ago, I was on a project for a major bank. The bank was issued a significant deficiency by their auditors for not having adequate documentation for their data. They asked us to document the lineage of their data, tracing physical data elements (PDEs) from the System of Origin (SOO) or System of Record (SOR) to the financial reports. For simplicity, you can think of a PDE as any number you see on a financial report.

Below are some tables you may see on bank's financial reports. To preserve the anonymity of the client, we pulled tables from many bank's financial reports.

Any number you see in the table can be considered a physical data element. The tricky part about a project like this is in how you test the documentation for accuracy. In order to tell if the data was documented properly, you need to run sample data through the process to determine if the output matches the financial report. If it doesn't, that means that a calculation was missed in the documentation.

The task was more complex because PDEs were typically aggregations or collections of other PDEs from other systems.

The issue

The project went really well until testing began. When testing began, a sample of 21 PDEs were selected for testing (because 21 is 1 more than the magic sample size of 20). However, when the sample was chosen, the sample was applied to the input PDEs, not the output PDEs. As a result, not all inputs were selected to be able to get the end result. Imagine the following formula. A + B - C = D. If A and C are part of the sample and you need to prove that D equals the number on your financial report, how will you be able to prove it? The testing methodology eventually had to be changed, because the right sample criteria was not selected.

Conclusion

If you find value in our posts, please support us by upvoting/commenting/resteeming our articles.

Thanks!

Sort:  

Awesome post!

Thanks!

This post has received a 0.17 % upvote from @drotto thanks to: @danvillani.

Coin Marketplace

STEEM 0.20
TRX 0.15
JST 0.030
BTC 65269.02
ETH 2653.11
USDT 1.00
SBD 2.84