Personalized Categorization of Financial Transactions

Size: px
Start display at page:

Download "Personalized Categorization of Financial Transactions"

Transcription

1 Personalized Categorization of Financial Transactions Alex Ran Intuit, Inc. Chris Lesner Intuit, Inc. Wei Wang Intuit, Inc Marko Rukonic Intuit, Inc. ABSTRACT An important aspect of =inancial accounting involves organizing business transactions into different categories. Manual categorization of business transactions is a tedious, time-consuming task that has to be done over and over each month and hence automation of this task is of signi=icant value to the users of accounting software. In this paper we present an analysis of this domain and demonstrate that automatic transaction categorization is similar to personalized tag prediction over shared resources that are often solved using co-occurrence matrices of tags attached to resources. We show that in the =inancial accounting domain inverting the relation yields better prediction results. We present our method for automated transaction categorization and share selected performance metrics collected at scale from a market leading accounting system that is serving millions of small businesses worldwide. KEYWORDS Financial transactions, categorization, accounting, recommendation, tag prediction, personomy, classi=ication 1 INTRODUCTION A major part of =inancial accounting is about organizing business transactions into different categories. In accounting, each category is called an account and the collection of accounts a business uses is called a Chart of Accounts (CoA). De=ining Charts of Accounts and using them to categorize business transactions in compliance with accounting guidelines is core of specialized accounting knowledge. In the past, information about each business transaction had to be manually added to an appropriate account. This is still the case for a large fraction of business transactions. At the same time, with the widespread adoption of online banking, information about many business transactions can be imported from records that are maintained and made available by =inancial institutions. The =inal step of adding these transactions into accounting books still requires knowledge of what kind of expense or income the transaction represents and which account in the chart of accounts of a given company it =its best. Manual categorization of business transactions is a tedious, time-consuming task that has to be done over and over each month and hence automation of this task is of signi=icant value to the users of accounting software. The rest of the paper is organized as follows. First we present some analysis of data in an attempt to classify the problem and identify candidate solutions and related work. We then present our current solution that has been in production, serving Intuit users for a little over a year. Finally we share some experimental results and conclusions. Our contribution is in two areas. We present a new approach that is applicable to problems that don t =it neatly into the usual categories of classi=ication and recommendation and have elements of both supervised and unsupervised nature. Our approach is deployed at scale and handles billions of =inancial transactions each year. Our solution combines fragments of information from millions of users in a manner that allows us to accurately recommend user-speci=ic Chart of Accounts categories. Accounts are handled even if named using abbreviations or in some unknown foreign language. Transactions are handled even if a given user has never categorized a translation like that before. The development of such a system and testing it at scale over billions of transaction is a =irst in the =inancial industry. 2 UNDERSTANDING THE PROBLEM There are several major challenges that make automatic categorization of =inancial transactions a really hard problem. 2.1 COUNTER PARTY IDENTIFICATION Unlike an invoice or receipt the data of transactions downloaded from banks generally does not have explicit information about the line items pertinent to the transaction what was purchased for example. But there are multiple other attributes that may correlate with the choice of the correct

2 Data Science in Fintech, SIGKDD 18, August 2018, London, UK account. Examples include: (1) the =inancial institution that recorded the transaction, (2) the speci=ic bank account and its attributes, (3) the date and time of the transaction, (4) the amount and (5) the counter party with whom the transaction took place. The most predictive attribute however is the counter party with whom the transaction took place, and for simplicity, in what follows we use the terms transaction and counter party interchangeably. Transactions descriptions downloaded from banks often contain traces of information related to (1) counter party name, (2) transaction location, (3) time/date, (4) payer identity, (5) transaction reference ID # s, (6) payment method and (7) other details all merged into a single string of variable format that depends on not just the counter party identity but also the payment method (debit card vs. VISA, etc.) and speci=ic path the transaction took through the payment network before reaching the payer s bank. The transactions at a single location of the popular San Francisco Safeway grocery store appear in well over 300 different description formats. A billion US bank downloaded transactions yields about four hundred million unique transaction descriptions and after some normalization and cleaning (case folding, digit folding, etc.) this reduces to about one hundred million unique descriptions. For the purpose of transaction categorization, ideally we want to know the counter party category and in the best case speci=ic items or services involved in the transaction. Unfortunately, at best, only information about the name of the counter party is available. Counter-party identi=ication and transaction item identi=ication are discussed elsewhere. Here we assume that counter party identity is all that is known about the transaction. 2.2 CHART OF ACCOUNTS IS A PERSONOMY The charts of accounts used by a small business is a taxonomy of their business transactions. It is used to calculate tax liability. It is also used to generate =inancial reports, project forecasts and to analyze business aspects such customer or product pro=itability and/or counter party selection and management. While some of these concerns assume that accounts can be mapped to categories driven by taxation and =inancial analysis, the need for business insights often drives the adoption of business speci=ic categories that may relate to the organization of the business, structure of the market and/or organization of company s customers, products or suppliers. For a small business the Chat of Accounts (CoA) can thus can be thought of as a personomy [2], which is understood as userspeci=ic vocabulary applied to categorization of items (resources) from a shared domain such as, for example, web resource bookmarks, photos, or bibliographic references. Since there are no strict rules for the organization and description of the chart of accounts and because it is de=ined by accountants based on the idiosyncrasies of the speci=ic business the variation is such that hardly any two businesses have the same Chart of Accounts and even if accounts are named the in the same way there is no simple way to determine whether they actually serve the same or different purpose. A million companies have on the order of a hundred million accounts of which roughly half have account names that are entirely unique. Applying some linguistic analysis and normalizations it is possible to bring the set of different account descriptions to about 4 million unique names. While there is a small number of categories that are used by many companies, the number of unique categories grows linearly with the number of companies. See Fig.1. If we examine how the number of unique accounts grows with the number of companies it becomes apparent that most companies use company speci=ic, unique accounts in their CoA. Figure 1: Number of distinct account names (used to Kile transactions) as a function of company count. 3 THE SPACE OF APPLICABLE SOLUTIONS 3.1 CLASSIFICATION It is possible to think of categorization of =inancial transactions as a classi=ication problem. There are billions of transactions that have been categorized by small businesses in the past that can be used as labeled data and so a training and test set can be constructed for supervised learning. Unfortunately, an attempt to approach the problem in this way is not likely to succeed. Learning from 10 9 examples (categorized =inancial transactions) how to classify on the order of 10 8 unique items (counter parties identi=ied from transaction descriptions) into 10 6 (distinct accounts) is not likely to result in a useful system simply because there is not nearly enough data. One could instead construct company speci=ic classi=iers using as a labeled set the transactions that the business owner has manually categorized in the past. Unfortunately, that would not change fundamentally the fact that there is not enough data to 2

3 Personalized Categorization of Financial Transactions achieve good coverage for future transactions. Speci=ically, on average about 50% of the transactions are with new counter parties. That means that even if future transactions with the same counter party were always categorized correctly and new counter party transactions were assigned to the most popular account the overall accuracy of such a classi=ier would not exceed 60% in the common case. Nevertheless company-speci=ic classi=iers are useful and that is the state of the practice today among other vendors of online accounting. While this simple approach solves the easy part of the problem it is possible to do much better by properly integrating the knowledge of how many different businesses categorize transactions into their own personalized charts of accounts. 3.2 RECOMMENDER SYSTEM When a company C downloads from its bank a transaction with counter party V if a transaction with V has been previously categorized by C into account A there is a high probability that the new transaction V should be categorized into the same account. This however applies to only about 50% of all transactions. A major question then is how accounting history of other companies that may have categorized V in the past can be used to in=luence the prediction made for C. Such problems are commonly addressed by recommender systems. When a large community of users U performs classi=ication of a large set of items I into a (commonly small) set of categories several algorithms [1] can be used to infer behavior of users on new items from behavior of similar users on similar items where user similarity is determined by how they have classi=ied the same items in the past. The challenge in applying recommender approaches for our case lies in the fact that we deal with on the order of 10 6 of categories that cannot be presumed to be shared between companies. Accounts are richer objects than tags since in addition to the label, accounts include other attributes that may carry some meaningful information with respect to business transaction categorization. On the other hand, none of the attributes associated with an account is a fully reliable indicator of shared semantic interpretation. For example, two accounts of different users may have the same label and be used for different purposes by their owners, while in other circumstances two accounts with different labels might be used for the same purpose by their owners. Therefore, attributes of the accounts (such as for example account label) to which V was categorized by other companies are relatively weak predictors of the correct account for V in C s CoA. To illustrate this point, we could look at how well popular account names discriminate between popular counter parties Data Science in Fintech, SIGKDD 18, August 2018, London, UK when integrating categorization results from multiple companies regarding identically or similarly named accounts across companies as representing the same category. Figure 2: Counter party to account name mapping for 1000 most popular counter parties and account names. One can readily observe that identical or similar account names when seen as representing the same category across companies fail to discriminate between different counter parties, which invites the conclusion that account labels are not semantically consistent across different companies. 3.3 PERSONALIZED TAG PREDICTION A =lavor of recommender systems that is closer to our problem then is known as personalized tag recommender [3, 4, 7]. A typical scenario of personalized tagging is happening when users annotate and tag a large collection of resources for personal use that may also include some form of communication and sharing [5]. This is the case when users bookmark websites, tag or categorize news and research articles, and photos [6]. Unlike in usual applications of recommender systems that rely on a shared vocabulary as labels for resource categories, here users have their own, unique vocabularies as category labels and [2] proposed an approach for translating between different vocabularies. Personalized tag recommenders answer the question: What n tags t are most likely to be assigned by user u to a resource r. This is exactly the question we need to answer when predicting a ranked list of accounts to which a company is most likely to categorize a transaction with a given counter party. A common approach to this problem is using normalized tag cooccurrence frequencies in resource annotations using for example Jaccard index over tags when applied to the same resource. If a " and a ' are two accounts in the CoA of some companies their similarity can be computed as J "' = J *a ", a ', = -a " a ' - -a ' a ' - Where a " stands for the number of times the same transaction has been categorized intoa ". In our case however accounts 3

4 Data Science in Fintech, SIGKDD 18, August 2018, London, UK typically contain at most a few hundred transactions representing perhaps couple dozen distinct counter parties. While in a typical social tagging systems of items (resources) are tagged by users using unique tags, personalized transaction categorization handles 10 7 counter parties (items) for 10 6 users with 10 8 accounts (tags). As a result it is much more ef=icient to collect counter party co-occurrence statistics over accounts than account cooccurrence statistics over counter parties. 4 ACCOUNT LIKELIHOOD RANKING 4.1 USERS WITH HISTORY Personalized transaction categorization assigns transaction t " to account a ' according to maximum likelihood given the companies speci=ic CoA and the transactions that have been assigned to these accounts so far. For users who already have categorized some transactions before, we can use their previous counter party assignment to accounts as the clue for future counter party assignments that is, if the exact counter party had been categorized before by the particular user, then this counter party-account assignment will be used as the prediction. Otherwise we chose the account that contains the collection of counter parties having the highest co-occurrence with the transaction counter party. More formally this can be described as follows: Let each counter party be represented as n-dimensional vector of its normalized co-occurrence with other counter parties: T3 " = (J 6", J 7",, J "",, J 9" ), i = 1,2,, n, 0 J "' 1 Then the likelihood for a counter party to be categorized into an account is given by P*a " -t ', E J jk TJÎai The =inal prediction then amounts to selecting n accounts with the highest score. We have used several different measures of counter party cooccurrence and evaluated their performance with a cross validation dataset. One measure for counter party cooccurrence is Kulczynski similarity index: J "' = J *t ", t ', = 1 2 *P(t ' t " ) + P(t " t ' ), Another is Jaccard index: = 1 2 N-t " t ' - t " + -t " t ' - O -t ' - Where: J "' = J *t ", t ', = -t " t ' - -t " t ' - = -t " t ' - t " + -t ' - -t " t ' - t " = number of accounts that have transactions with counter party t " -t " t ' - = number of accounts that have transactions with both counter party t " and t ' -t " t ' - = number of accounts that have transactions with either counter party t " or t '. Jaccard index being not null invariant is affected strongly by asymmetry in frequencies of counter parties. That is the Jaccard similarity of a common counter party with an infrequent counter party approaches 0 even though the two counter parties are very likely to co-occur in the same account. In other words Jaccard index loses information from and for common counter parties. The Kulczynski measure on the other hand is null-invariant and preserves information even in the case of asymmetric counter party frequencies. In our experiments however, we found that using Jaccard index gave us a slightly better categorization accuracy. One explanation is our strategy to give preference to accounts that already contain a transaction with the given counter party. If the user has already categorized a given counter party before, that previous account assignment will be re-used as the prediction when this counter party needs to be categorized again. This is a special case of self-co-occurrence. That is for each user k, J "" = R 1, if user k categorized t " before 0, otherwise This strategy likely applies to transactions with a common counter party thus masking the lack of null-invariance of the Jaccard index. Though the counter party co-occurrence matrix (T3 6, T3 7,, T3 9 )` could be in the order of 10 million by 10 million, it is very sparse and can be ef=iciently reduced to ~ billion non-zero elements, making real time account scoring a relatively ef=icient. For prediction: Given a transaction t from user k, for every account a " which had transaction history, calculate: J ab = E ȷ êf fge where ȷ êf is the coupling value between counter party t and m (m denotes all other counter parties that are coupled with counter party t). And =inally choose the account as: a = argmax a b ( J ab ) 4

5 Personalized Categorization of Financial Transactions 4.2 NEW USERS AND EMPTY ACCOUNTS While the approach described in the previous section is effective for ranking accounts that have multiple transactions already assigned, users that have not yet categorized any transactions must be supported as well as users that have empty accounts in their chart of accounts. Data Science in Fintech, SIGKDD 18, August 2018, London, UK 5 EXPERIMENT RESULTS We measure the system performance as the ratio of automatically categorized transactions accepted by users without changes to the total number of transactions imported by the users from =inancial institutions. Our system performs consistently with accuracy above 70% as well as high precision and recall (Fig. 3) (as calculated at individual account level.) In these cases, we resort to recommendations that use account attributes. However due to the low reliability of account attributes we apply a relatively simple procedure. As mentioned earlier accounts are described by multiple attributes each of which can be used as a feature to predict how likely the new transaction should be assigned to this account. In what follows we use account names as an example. We use TF-IDF values in the context of transaction-account matrix measure of how transactions and account names are coupled. Table 1 arranged in matrix form shows our transaction-account matrix in which every element is calculated as: P*T " -A ', n1 + log f`b,q r s log N n J f`b,q v O Figure 3: Precision and recall for transaction categorization. Fig. 4 shows higher accuracy is attained when the model is trained on data that is closer in time to the test set. There is a clear bene=it to periodic re-training using ever newer data to keep the model fresh maintaining high accuracy. where f`b,q r counts how often transaction T " was assigned to account A '. Accuracy Table 1: Transaction-account name co-occurrence matrix Age of last transaction (days) Figure 4: Accuracy as a function of gap in days between training set (Kirst in time) and test set (next in time). Linear decay (blue) and exponential decay (red) have similar Kit. In this way have scored empty accounts for their likelihood to be the right category for the new transaction. For prediction: Given a transaction t from user k, for every account a " in this user s CoA, choose the account as Categorized transactions are reviewed by accountants for accuracy. If con=idence of prediction correlates well with the accuracy of the prediction, communicating the con=idence of prediction to the user can enable differential levels of trust and make the review process more ef=icient. a = argmax( P*t-a ',) ' 5

6 Data Science in Fintech, SIGKDD 18, August 2018, London, UK We need to associate con=idence with predictions made for transactions with new counter parties as well as those the company had business in the past. For known counter parties we calculate con=idence associated with each prediction as a linear function of time duration between when the transaction happened and when the same counter party was categorized by same the user last time as may be apparent from Fig. 4. For predictions with respect to new counter parties, con=idence is calculated as the ratio of the likelihood of most appropriate account vs. alternative accounts. Fig. 5 shows that categorization accuracy correlates well with the con=idence: [3] Meiqun Hu, Ee-Peng Lim and Jing Jiang, A Probabilistic Approach to Personalized Tag Recommendation, Social Computing (SocialCom), 2010 IEEE Second International Conference on [4] Fabian Abel, Samur Araujo, Qi Gao, Geert-Jan Houben, Analyzing Cross-System User Modeling on the Social Web, Web Engineering: 11th International Conference, ICWE 2011, Springer-Verlag, Berlin, 2011 [5] R. Jaeschke, L. B. Marinho, A. Hotho, L. Schmidt- Thieme, and G. Stumme. Tag recommendations in folksonomies. In Proc. 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, [6] Sigurbjornsson, B., Van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th international conference on World Wide Web. pp (2008) [7] Steffen Rendle, Leandro B. Marinho, Alexandros Nanopoulos, and Lars Schimdt-Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD 09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, Figure 5: Categorization accuracy vs. conkidence. 6 CONCLUSIONS We have presented a new method for automatic categorization of =inancial transactions for small business accounting. Our approach is deployed at scale within a market leading accounting system handling billions of =inancial transactions a year. Our solution combines fragments of information from millions of users in a manner that allows us to accurately recommend user-speci=ic Chart of Accounts categories. Accounts are handled even if named using abbreviations or in some unknown foreign language. Transactions are handled even if a given user has never categorized a translation like that before. The approach is applicable to other problems that do not =it neatly into the usual categories of classi=ication and recommendation and has elements of both supervised and unsupervised statistical machine learning. REFERENCES [1] Bobadilla, J., Ortega, F., Hernando, A., Gutierrez, A.: Recommender systems survey. Knowledge-Based Systems 46(0), (2013) [2] Robert Wetzker, Alan Said, and Carsten Zimmermannm,Understanding the user: Personomy translation for tag recommendation, 6