Loading...

CPA Now Blog Archive

This is the archive of CPA Now blogs posted on the PICPA website through April 30, 2025. Want more recent blogs?

Read current blogs

Data Quality Management Is the Foundation of Analytics Success

Data analysis is only as good as the data used. To ensure high-quality analytics, CPAs must first ensure high-quality data. Find out the primary dimensions to consider when evaluating the quality of a data set.

Sep 28, 2020, 05:22 AM

Matt Kraemer, CPA, CIDABy Matt Kraemer, CPA, CIDA


Data analysis is only as good as the data used. To ensure high-quality analytics, CPAs must first ensure high-quality data. Sure, this is a simple concept, but sometimes it proves more difficult during an engagement than in the planning. Too often, we perform analyses only to find that the trends and insights derived from the analysis have become clouded due to poor data.

The primary dimensions to consider when evaluating the quality of a data set are as follows:

  • Completeness
  • Uniqueness
  • Timeliness
  • Validity
  • Accuracy
  • Consistency

Each is important for a different reason.

Completeness – This covers both for full records that may be missing from a dataset or records with certain fields missing. Either way, incomplete data can lead to inaccurate data analytics. Consider the example of a sales-by-customer analysis. If a sale is recorded without a customer name or customer code, the sale will be reflected in the analysis, but it will be just as useless for evaluation as if the full record was missing.

Uniqueness – Evaluate the number of expected records compared with the total number of actual records. Performing tuition analytics at a higher education institution can be skewed if there are duplicate student records (“Matthew Kraemer” and “Matt Kraemer”). Uniqueness simply refers to the absence of duplicates.

Timeliness – For data analytics to be impactful, data need to be from a recent period or at least from the time-period under evaluation. That is why timeliness is important to data quality. In context to audit data analytics, this would mean using data from the period under audit in your analysis. For instance, if you are performing analysis to determine if there are significant payments to executives, you must use a list of executives that were active employees during the period under audit as opposed to a listing at the time of the audit procedure that may have a current employee who started with the company after the audit year.

Validity – Records and fields (essentially the rows and columns of a database) must conform to the allowable type and format of the data model. When evaluating accounts receivable balances compared with credit limits for customers, you would expect credit limits to be in numeric or currency format. If you encounter a customer that has an accounts receivable balance of $1 million and a credit limit of “Purple,” the analysis will not be helpful for your audit.

Accuracy – The degree to which data correctly describes the “real world” object is critical. The obvious example of inaccurate data would be if data is entered incorrectly. Another example is date formatting. If the format of sales date in a European sales detail is DD/MM/YYYY and it is combined with a U.S. sales detail with a sales date format of MM/DD/YYYY the results will be inaccurate. When you perform disaggregated sales analytics by month, a European sale made on Feb. 5, 2020, might be reflected as May 2, 2020 in the analysis.

Consistency – This is required for your data sets as well as when comparing databases from different systems. An example of this is an accounts receivable subsequent liquidation on the full population. For the liquidation to work appropriately, the customer name in the accounts receivable aging report must be consistent with the payor name in the cash receipts report. Thus, ABC Co. would not match with ABC Company.

The terms data quality and data integrity are often used interchangeably, but, really, they are slightly different. Data integrity focuses on the validity of the data, and asks the question “Is the data reasonable, sensible, and not manipulated (whether intentionally or unintentionally)?” This is different than the definition of validity mentioned above (i.e., “Does it conform to the standards of our model?”)

Both data quality and data integrity are integral to performing reliable and meaningful data analysis. The good news is there are plenty of tools available for data quality checks. If you do not invest in one of those tools, I would encourage every firm to implement a data quality checklist when performing significant data analytics as part of their audit or service plan.


Matthew Kraemer, CPA, CIDA, is manager of ADAPT consulting services with Schneider Downs & Co. Inc. in Pittsburgh. He can be reached at mkraemer@schneiderdowns.com.


For more coverage of accounting technology issues, mark your calendar to attend PICPA’s Technology Conference on Jan. 7, 2021.


Sign up for weekly professional and technical updates in PICPA's blogs, podcasts, and discussion board topics by completing this form




Stay informed with PICPA blogs