Text Analytics and the Accounting Profession

Text analytics, also referred to as text mining or textual analysis, consists of data mining techniques and software programs that analyze and extract insights from unstructured textual data, such as PDF documents, HTML pages, emails, and social media posts. These techniques have been generally used in academia, library services, artificial intelligence (AI) research, and marketing.

by J. L. “John” Alarcon, CPA, CGMA, CITP, Yue Liu, PhD, Kevin C. Moffitt, PhD, and Sheneya Wilson, CPA May 26, 2021, 08:14 AM

TextAnalytics_250x383Text analytics, also referred to as text mining or textual analysis, consists of data mining techniques and software programs that analyze and extract insights from unstructured textual data, such as PDF documents, HTML pages, emails, and social media posts. These techniques have been generally used in academia, library services, artificial intelligence (AI) research, and marketing.

The explosion of internet content combined with innovations such as big data architectures, AI, and machine learning have significantly enhanced the ability to extract meaningful insights from textual data. In this context, text analytics has entered the world of accounting.

Early adopters, including the Big Four and some other firms, are using text analytics software in accounting, auditing, tax, and business advisory for automation, compliance assurance, fraud detection, or planning purposes. Text analytics algorithms are also being used in various business applications, ranging from customer engagement to business planning and execution.

Here, we discuss the latest trends in text analytics in the accounting profession. We aim to address the following:

  • Applications that exist for accountants
  • The state of adoption and the challenges faced by users  
  • How to learn more about these technologies and other practical recommendations.

Text Analytics

The definitions of text analytics have evolved over time with innovations in the field. Research firm Gartner explains, “Text analytics is the process of deriving business insight or automation from text. Vendors in this market provide products that extract meaning and context from textual content, which can then be used to derive insights and action, either within the context of the product or by other products to which the data is made available.”1 Similar terms, such as text mining or textual analysis, are often used interchangeably, though with a nuance that we will ignore for the purpose of this article.

Text analytics software has evolved rapidly with the growing adoption of natural language processing (NLP) and machine learning methodologies. NLP is a subfield of AI that focuses on interpreting natural language from speech or text sources and encompasses methodologies for both natural language understanding and natural language generation.

The text analytics software market is composed of tools or solutions that range from software development components that can be embedded into software applications to complete applications dedicated to specific tasks, such as statistical analysis or legal contract reviews. Text analytics vendors include IBM (Watson Natural Language Understanding), SAS (Text Miner), Amazon (Amazon Comprehend), Microsoft (Microsoft Azure Text Analytics), Kira Systems Inc. (contract analysis), and Seal Software (contract analysis). For programmers and technology-savvy data scientists, libraries of text analytics algorithms are available in programming languages.

The functional span of these tools and applications may include preprocessing textual data, analysis and extraction, and reporting or visualization.

Preprocessing textual data converts it into a usable format. It includes data cleaning, normalization, parsing, and semantic analysis (identifying words or groups of words, determining grammatical structure, and defining relationships between words for purpose of interpreting the meaning of the textual information).

Analysis and data extraction may include functions such as classifying or grouping documents into clusters based on their similarity. Information that these systems can extract could range from word counts to vague context understanding (e.g., location of an event, affiliation of an individual to an organization, or their role in an organization) by using relation extraction techniques. Data extraction can also be used to automate data entry by enabling a system to recognize relevant information from a source document, extract it, and process it. Combined with machine learning technologies, these systems improve in their performance over time as systems are able to recognize patterns and learn from training data sets and ongoing usage of the system.

Text analytics tools also may be used to report insights from reports and graphical representations (e.g., word clouds, histograms, or network graphs). This includes automated summarizations that can be fed into question-answering systems, such as those used by chatbots and virtual assistants (e.g., Siri or Alexa). Amazon, Facebook, Google, IBM, and OpenAI are examples of platforms that are innovating in NLP.

Applications for CPAs

Text analytics capabilities, combined with machine learning and optical character recognition (OCR) technology, are being embedded into numerous business applications, including accounting systems such as employee expense or accounts payable automation applications. They are being integrated with robotic process automation (RPA) systems to further streamline document processing, data entry, and other information processing tasks performed by accountants and financial analysts.

Text analytics has also entered the world of tax accountants. For example, Intuit Inc. is among the software vendors that leverage NLP in the tax preparation and planning areas. Intuit’s TurboTax users can now streamline tax preparation and improve the accuracy of their computations by using Intuit’s Tax Knowledge Engine, an application that prompts tailored questions to a client and performs computations, with built-in explanations, by “intrinsically correlating and intertwining more than 80,000 pages of U.S. tax requirements and instructions.”2

In their recent book, Artificial Intelligence in Accounting: Practical Applications,3 this feature’s coauthor John Alarcon and Cory Ng summarize use of text analytics in four families of applications: audit automation, accounting automation, tax automation, and business advisory applications.

Audit automation – Audits can leverage text analytics to streamline the audit process. One example would be by using contract analysis software to automate the contract review process (customer contracts, vendor contracts, employment agreements, etc.). In addition, text analytics enables auditors to increase the value of the audit to clients by providing the ability to detect trends or risks from large amounts of textual data (PDF documents, SEC filings, press releases, social media posts, and so on) that could not be audited before (other than through sampling and manual methods). Finally, text analytics enables more timely and continuous audits, which can help improve detection of anomalies (such as errors, compliance violations, or fraud) and decision-making.

Accounting automation – Leveraging text analytics capabilities, such as intelligent information extraction, accounting systems combined with OCR technology and RPA applications can take automation to the next level. Systems can now extract the relevant information from receipts, invoices, or other PDF documents, and then automatically record transactions, process them, or make recommendations.

Tax automation – As Alarcon and Ng suggest in their book, text analytics powered by machine learning will have a profound effect on tax automation. It is not inconceivable to see the emergence of question-answering systems or virtual assistants that augment humans in addressing client-specific tax questions. Similar tools could arise in tax audits or tax litigation analysis.

Business advisory applications – Text analytics has many potential applications in various business advisory areas, such as managerial accounting, forecasting, internal audit, compliance reviews, IT audits, and forensic accounting.4

Uses and Challenges

We conducted interviews with several small to midsize accounting firms to understand how they perform textual analysis. We found that textual analysis is currently accomplished with varying degrees of automation, and that a variety of documents are being analyzed, including public filings, journal entries, contracts and other legal documents, board meeting transcripts and minutes, tax documents, mortgage and lease documents, partnership agreements, emails, and social media posts. Matching information across related documents using textual analysis is a common task, as is the manual verification of computer output. Even so, each firm interviewed cites similar benefits of time and money savings, increased efficiency, and improved risk assessment accuracy. Firms also use textual analysis to gain high-level-document understanding and drill-down capabilities to examine the content surrounding specific matched key words and phrases. For example, in complex lease documents textual analysis can identify information about covenants and guarantees and key point occupancy agreements. The major difficulties of implementing textual analysis are employee training (including inculcating employee buy-in). Younger employees, on average, seem to be more eager to learn and implement new technologies compared with those who feel more comfortable implementing a familiar approach. One firm noted that when clients realize the benefits of textual analysis, they demand its use in the engagement. Thus, client demand will likely be an impetus for textual analysis adoption in many accounting firms.

Learn More about Text Analytics

Text analytics has emerged as a new area of interest among accounting and finance educators. A number of academic papers explore different applications of text analytics in accounting. Tim Loughran and Bill McDonald published a comprehensive survey on this topic,5 with a good introduction of the related literature. The article provides a foundation to grasp an overall understanding of text analytics in accounting and finance. Recent papers that are more closely related to audit may help CPAs form ideas about how text analytics can be used in those engagements. James R. Moon Jr. and Quinn T. Swanquist, for instance, introduce a way to use text analytics to measure the misreporting risk in 10-K disclosures.6 Other examples illustrate how auditors can use text analytics to evaluate risks in engagements or how analyzing the tone in firm disclosures can reduce type II errors in the evaluation of going-concern uncertainties.7

For accounting and finance professionals in general, software vendors can be a useful source of information. They provide various tools with text analytics functions, and some are tailored for the accounting profession. Most have detailed, step-by-step tutorials on their websites (some also with video guides) that make it easier for users to understand them.

Examples of useful tutorials and learning resources include Google Cloud Natural Language, IBM Watson Natural Language Understanding, and Text Analytics & Sentiment Analysis API Text2Data. For details, see “Illustrative Text Analytics Tool Descriptions” below.

Another example is Kira Systems, a tool for contract analysis that may be especially relevant for auditors. It helps reduce the large volumes of time-consuming manual review required for an audit, and allows auditors to produce client audit files and reports for easy clause comparison. A number of accounting firms are already using Kira, including Deloitte, Plante & Moran PLLC, BKD LLP, Cherry Bekaert, Dixon Hughes Goodman LLP, and Moss Adams LLP.

A good thing about the tools discussed here is that they don’t require any programming. People can select the analysis they want to perform, and the system will produce and present results in an organized way. A disadvantage is that they usually do not allow complete customization of an analysis, but most functions have been established based on market needs.

Technology-savvy CPAs with a programming background (or an interest in learning programming) may find it more flexible to develop their own analysis. Python and R are probably the most popular programming languages used for textual analysis. There are open source packages built for different types of analysis, and it is relatively easy to find tutorials and demonstrations about these packages on the internet. For example, Pratik Shukla and Roberto Iriondo provide a Python NLP tutorial that covers many specific packages and techniques used for text analytics in Python, such as Natural Language Toolkit, word cloud, stemming, and part-of-speech tagging, among others.8

There are also online courses about text analytics. For example, Coursera offers a course by ChengXiang Zhai of the University of Illinois at Urbana-Champaign, called “Text Mining and Analytics,” that covers major techniques for analyzing texts to discover patterns and extract useful knowledge. Though the course does not specifically address issues in accounting and auditing, it discusses the basic concepts, principles, and major algorithms in text mining and potential applications. It is likely more helpful for starters who want a broad understanding of text analytics. Accounting educators, too, have started to bring text analytics into the accounting curriculum. For example, Kevin Moffitt from Rutgers Business School (coauthor of this article) teaches a course called “Decoding of Textual Corporate Communications” for graduate students. The course focuses on applying text analytics to the broad accounting area. It discusses common and advanced text mining techniques, including the traditional bag-of-words methods as well as machine learning and deep-learning-based methods. The course also teaches students how to apply these techniques with Python and how to use them in accounting.


Whether embedded in business applications or as stand-alone applications, text analytics technologies have entered the accounting profession. CPAs have access to a wide range of texts that can be helpful in assessing risk or exercising professional judgment. To capture useful information from these texts effectively and efficiently, text analytics tools are becoming necessary solutions about which CPAs should become more familiar.

The Big Four and some other accounting firms have already taken action to incorporate text analytics capabilities into their processes for audit risk assessment, tax work, contract analysis, and other tasks. It is critical for CPAs to learn more about text analytics and to understand how to use these technologies moving forward to stay competitive.

1 Market Guide for Text Analytics, Gartner Research (Nov. 5, 2018).
2 Gang Wang, “Tech Talk: Intuit’s AI-Powered Tax Knowledge Engine Boosts Filers’ Confidence,” Intuit blog (March 6, 2019). https:/www.intuit.com/blog/social-responsibility/tech-talk-intuits-ai powered-tax-knowledge-engine-boosts-filers-confidence. Intuit also leverages natural language processing and machine learning in accounting automation. See for example: “Building a Financial Document Understanding Platform” by Amar Mattey, Joy Rimchala, TJ Torres, and Xiao Xiao, Intuit (Nov. 14, 2019). https://medium.com/intuit-engineering/building-a-financial-document-understanding-platform-9e42f7d497c.
3 Cory Ng and John Alarcon, Artificial Intelligence in Accounting: Practical Applications, Routledge Taylor & Francis Group (2021).
4 See for example: George R. Aldhizer, “Visual and Text Analytics: The Next Step in Forensic Auditing and Accounting,” The CPA Journal (2017), and Daniel Torpey and Vincent Walden, “Accounting for Words: Text Analytics Technology May Help Internal Auditors Uncover Hidden Risks and Gain Greater Insight on Business Performance,” Internal Auditor (August 2009).
5 Tim Loughran and Bill McDonald, “Textual Analysis in Accounting and Finance: A Survey,” Journal of Accounting Research (September 2016, Vol. 54, Issue 4).
6 James R. Moon Jr. and Quinn T. Swanquist, “Measuring Misreporting Risk in Firms’ 10-K Disclosures and the Auditor’s Role in Mitigating Misstatements” (September 2018). https://ssrn.com/abstract=2997967
7 Manlu Liu, PhD, Kean Wu, PhD, Rong Yang, PhD, and Yang Yu, PhD, “Textual Analysis for Risk Profiles from 10-K Filings: Evidence from Audit Opinions,” The CPA Journal (June 2020, Vol. 90, Issue 6), pages 36-41; Mahmud Hossain, Kannan Raghunandan, and Dasaratha V. Rama, “Abnormal Disclosure Tone and Going Concern Modified Audit Reports,” Journal of Accounting and Public Policy (2020, Vol. 39, Issue 4).
8 Pratik Shukla and Roberto Iriondo, “Natural Language Processing (NLP) with Python – Tutorial.” https://medium.com/towards-artificial-intelligence/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0


Illustrative Text Analytics Tool Descriptions

Google Cloud Natural Language is a powerful general-purpose platform that enables users to analyze text on the cloud. It provides three tools: AutoML Natural Language, Natural Language API, and Healthcare Natural Language AI.

• AutoML Natural Language allows users to upload documents (e.g., orders, receipts, invoices) in various formats (e.g., PDFs, texts, or emails) to train their own models for four different objectives (single-label classification that classifies documents by assigning a label, multilabel classification that allows a document to have multiple labels, entity extraction that identifies entities within documents, and sentiment analysis that analyzes attitudes within documents).

• Natural Language API provides pretrained models that allow users to easily apply natural language understanding to their applications with features including entity analysis, sentiment analysis, syntax analysis, and content classification.

• Healthcare Natural Language AI is used to gain insights from medical documents.

Entity analysis allows users to identify people, things, numbers, etc. Sentiment analysis assigns a sentiment score to the text to show how positive or negative it is. Syntax analysis extracts tokens and sentences, identifies parts of speech, and creates dependency parse trees for each sentence. Content classification sets documents into over 700 predefined categories to quickly gauge what a document is about.

IBM Watson Natural Language Understanding provides advanced text analytics, including functions that allow users to identify important keywords in the text and identify high-level concepts that aren’t necessarily directly referenced in the text. Other tools include a relations function that recognizes when two entities are related and identifies the type of relation. Another function analyzes emotion conveyed in text, such as joy, sadness, anger, or fear.


J. L. “John” Alarcon, CPA, CGMA, CITP, is a principal at BEARN LLC, a business advisory services firm in Philadelphia, and a member of the Pennsylvania CPA Journal Editorial Board. He can be reached at john.alarcon@bearnllc.com.


Yue Liu, PhD, is an associate professor in the accounting department at Southwestern University of Finance and Economics in Chengdu, China. She can be reached at lyuejune@qq.com.

Kevin C. Moffitt, PhD, is an associate professor in the accounting and information systems department at the Rutgers Business School in Newark, N.J. He can be reached at kevin.moffitt@business.rutgers.edu.

Sheneya Wilson, CPA, is a teaching assistant and a PhD candidate in the accounting and information systems department at the Rutgers Business School in Newark, N.J. She can be reached at sheneyawilson@gmail.com.

Load more comments
New code
Comment by from