Do Intensity Distributions in Words Suggest Tempering in Cheques?

Size: px
Start display at page:

Download "Do Intensity Distributions in Words Suggest Tempering in Cheques?"

Transcription

1 Do Intensity Distributions in Words Suggest Tempering in Cheques? A report submitted to the Institute for Development and Reasearch in Banking Technology, Hyderabad In partial fulfillment of the summer internship of Bachelor of Technology In Computer Science & Engineering By Kamal Singh Meena Indian Institute of Technology, Indore July, P a g e

2 CERTIFICATE This is to certify that the project work, entitled Do Intensity Distribution in Words Suggest Tempering in Cheques?, submitted for partial fulfillment of the requirement for the award of summer internship of Bachelor of Technology in Computer Science & Engineering to the Indian Institute of Technology-Indore is a bonafide work carried out by Kamal Singh Meena ( ) under my guidance. The matter embodied in the project report has not been submitted for the award of other degree or diploma. Project Guided by Dr. Rajarshi Pal Assistant Institute of Development and Research in Banking Technology (IDRBT), Hyderabad 2 P a g e

3 DECLARATION I, Kamal Singh Meena, hereby declare that this dissertation entitled Do Intensity Distribution in Words Suggest Tempering in Cheques?, submitted by me under the guidance and supervision of Dr. Rajarshi pal, Assistant Professor, IDRBT, is a bonafide work. I also declare that it has not been submitted previously in part or in full to this institute or other university or institute for the award of any degree or diploma. Date: 11 July, 2014 Place: IDRBT, Hyderabad. Signature of the student Name: Kamal Singh Meena Roll No P a g e

4 ACKNOWLEDGEMENT I would like to take this opportunity to thank wholeheartedly the gracious souls without whom this project would have been unfulfilled dream. I express my sincere gratitude to Dr. Rajarshi Pal, Assistant Professor, IDRBT, India, for his stimulating guidance, continuous encouragement and supervision throughout the course of present work. I am extremely thankful to Shri B. Sambamurthy, Director and IDRBT for providing me infrastructural facilities to work in, without which this work would not have been possible. I would like to thank my Friends for their constant source of encouragement, motivation and their help in my project invaluable. Finally I thank my parents for their love and encouragement. Sincerely Kamal Singh Meena ( ) 4 P a g e

5 ABSTRACT White collar crimes through addition or alteration of valuable data on cheques or other instruments are among the fastest growing problems around the world. Solution to examine such fraud cases are still limited in literature. Specially, the automatic detection of fraud by non-destructive techniques are having limitations. The RBI (Reserve Bank of India) has created the capability to enable detection of fraud in Cheque Truncation System. And CTS has emerged as an important efficiency enhancement initiative undertaken by RBI in the Payment System area. In this project, we restrict ourselves to do experiment that whether we can automatically detect or classify fraud words in cheques by using only intensity value or not. In this technique, we have removed all background lines and words, and separating each word (written by cheque holder), storing their intensity values from gray cheque image. Then, finding the difference of intensity distribution between each pair of words using Behrens and fisher statistics method. And using threshold (λ-cut) value, we are classifying words in different classes according to their intensity value. 5 P a g e

6 Contents of Thesis Chapter 1 Introduction Basics about paper document fraud: Fraud detection techniques: Cheque Truncation System Banking Terminologies Scope of Automated Alteration Detection and Objective of Thesis Chapter 2 Related Work...14 Chapter 3 Intelligent Grouping of Pixels Template Based Identification of Important Regions Removing Background Lines: Detecting the Position of Words Calculating the Distribution of Intensity Value of All Words Converting Tolerance Matrix to Equivalence Matrix Classification using λ-cut value.. 22 Chapter 4: Results Chapter 5: Conclusion and Future Work.. 28 Chapter 7: References P a g e

7 List of Figures 1.1 Fraud Cheque Image of Figure Original Cheque Image Fraudulent Cheque Image of figure Original Cheque Image Process Flow of CTS Gray Image of Figure Template of image 2.2 with cheque information Horizontal Projection Profile of Pay Name Binary image After Removing Background lines and words Vertical Projection Profile of Pay Name Portion FPR vs. TPR compression between FPR & TPR for various λ-value..27 List of Tables 3.1 Positions of Words Tolerance Matrix Equivalence Matrix Binary matrix for classification Calculating TPR and FPR 25 7 P a g e

8 Chapter 1: Introduction In present scenario, people are moving from ledger style to the software based style. But still ledger style exists in many areas due to lack of online resources and security. The number of forgeries in ledger style is also high. Every day at some part of the world people are victimized by such type of forgeries. We call this type of crime as white collar crime. 1.1 Basics about paper document fraud: A little addition or alteration in cheque by some other agent can cause disaster to the authority or person concerned. A fraudulent addition to checks, contracts and other legal documents may cause an irreparable damage in terms of human suffering as well as serious financial loss. According to the American Bankers Association, the average loss, due to fraud, nation s financial system producing annual losses of around 10 billion dollar and losses continue to rise at an alarming rate annually [4]. Cheque fraud and counterfeiting are among the fastest-growing problems affecting the nation s financial system. It is a great challenge to detect frauds in cheques automatically using some software, especially when frauds are with same color ink and with same orientation. Alteration is a broad term. Its meaning is a change either by addition, deletion or by adding additional strokes. Alterations can be made on various documents of our life like Banking Cheque, business agreement and educational documents etc. Few examples [9] are- A contract may be changed after the parties have to come an agreement and signed it, a cheque may be raised to a higher amount, the data on a document may be changed, doctors may add notes to their patient s etc. Several such cases are reported every day to the law enforcement agencies around the globe. 1.2 Fraud detection techniques: Fraud detection techniques can be categorized in two major pathways. First technique is destructive kind of detection, based on IR luminescence, chemical examination, Fourier-Transform Infrared (FTIR) microscopy, laser examination etc. And we categorize those techniques as non-destructive techniques which include modern 8 P a g e

9 chromatographic, image processing, pattern recognition [12], [15], nearest neighbor classifier, support vector machine [13] etc. The technique we are using in fraud detection case is a non-destructive technique which uses intensity distribution value of words. Some cases of alteration are follows. Case (1) - See figure 1.1 and figure 1.2 for additional fraud word in cheque with their original images. Let s assume Mr. X gave cheque to Mr. Y where Mr. Y s name is Mohan Singh and cheque amount is twenty thousand rupees. The third person say Mr. Z stole that cheque from Mr. Y and had made a malicious plan and changed the pay name of the cheque to Mohan Singh Chaudhary by adding a Chaudhary word with same color ink and in same orientation. So, it is easier for Mr. Z to issue that cheque with his name. It is very difficult task to detecting this type of fraud words automatically. Figure 1.1: Fraud Cheque Image of Figure P a g e

10 Figure 1.2: Original Cheque Image Case (2) - Figure 1.3 and figure 1.4 present image of an altered cheque (alteration in word) in original cheque. Let s assume Mr. A gave cheque to Mr. B where Mr. B s name is Rajesh Kumar and cheque amount is one lakh rupees. The person, Mr. B, had made a malicious plan and changed in the Rupees region of the cheque by adding a character F and made some alteration in e character, made it like r and by adding a stroke made 1 to 4 with same color ink and in same orientation. Now, it is easier for Mr. B to issue that cheque with altered amount. Figure 1.3 Fraudulent Cheque Image of figure 1.4 (taken from [1]). 10 P a g e

11 Figure 1.4: Original Cheque Image (taken from [1]). 1.3 Cheque Truncation System: Cheque Truncation System (CTS) is basically an online image based cheque clearing system (ICS). It provide faster clearing of cheque images. The Truncation, means stopping the movement of the physical cheques issued by a drawer to the drawee branch. Instead of carrying physical cheque it capture the image of cheques and send it to payee bank by the clearing house for verification. Along with relevant information like, data on the MICR (magnetic ink character recognition) band, data of presentation, presenting bank etc. After applying this technique we only need to move the physical instrument across branches in some exceptional circumstance for clearing purposes. Reserve Bank has implemented CTS. All type of cheques can be presented for clearing through CTS. It is no different from the use of traditional clearing cheques. There would be no change in the process for the customers. If a person is doing some fraud in between the present bank and payee bank, for detecting these type of frauds, many technology are available like watermarking, data embedding etc. But if a person is depositing an altered cheque to the presenting bank after doing some malicious alteration in the cheque then detecting this type of fraud detection is very difficult and still we could not find much in the literature to deal with this type of problems. 11 P a g e

12 1.4 Banking Terminologies: Here we are mentioned various banking terminologies will repeatedly appear in this thesis. Here, these terminologies are explained for better understanding of the reader. Presenting Bank: The bank which submits the cheque for clearing to the clearing house, and receive payment from other bank, is known as presenting bank. Paying Bank: The bank of a customer, who issues the cheque. After verifying the account details of the customer, the paying bank credits the amount to the presenting bank, which is issued on the cheque. Clearing House: Clearing house plays the intermediate role between the presenting bank and paying bank. Throughout the day, it collects the cheques from various presenting banks and sends them to appropriate paying banks. At the end of the day, monetary settlement between various banks is mediated by the clearing house. Clearing House Interface: CHI is software installed on servers of all banks, which are participating in CTS. The CTS Clearing House Interface (CHI) provides an easy-touse and standardized connectivity between the presenting/paying bank systems of a bank to the clearing house (CH). It provides a gateway for conduit of data and images. 12 P a g e

13 8. Settlement takes place 3. Sends the image Clearing House 3.1 Sends the image 6. Sends 1. Presents Confirmation cheques 7. Credit amount Presenting Bank 2. Captures cheque image 4. Store physical Cheque Paying Bank 5. Verification Presenting bank Warehouse Figure 1.5: Process Flow of CTS (taken from [15]) 1.5 Scope of Automated Alteration Detection and Objective of Thesis: The scope of thesis is detecting the fraud papers and cheque in Cheque Truncation System (CTS). CTS giving many benefits [11] to customers in various areas, including human resource rationalization, cost effectives, business process re-engineering, better service, adoption of latest technology etc. Automation fraud detection technology use in various departments like in banking, business, collages and institutes etc. Wherever we are using pen and papers for writing our data, there can be possibility of fraudulent things. The facilitate reduction in the clearing cycles. Moreover, there is no fear of loss of instruments in transit because in this type of technique we required only image of that document. There is no limitations of using fraud detection techniques in terms of geography. 13 P a g e

14 The benefits of automated alteration detection techniques [11] could be summarized as follows:- Shorter clearing time cycle for detection of fraudulent. Superior verification and reconciliation process for online fraud detection. No geographical restrictions as to jurisdiction. It will provide operational efficiency for banks and customers. And it reduces operational risk and risks associated with paper clearing like we don t need to carry out physical cheque for checking it s fraudulent. We can detect a fake cheque or scanned copy of a cheque by comparing feature in CTS-2010, such as embedded verifiable features, bar-codes, encrypted codes, logos, watermarks, holograms etc., for early interception of altered / forged instruments. But if fraud are committed by inserting or changing words/digit in original cheques, then it will be more difficult to detect such frauds. The main objective of this thesis is providing experimental result, which can conclude that whether we are able to detect addition or alteration of words (written with malicious intension) in cheque by using only intensity distribution of each words in cheque image. 14 P a g e

15 Chapter 2: Related Work This is an emerging area, where we are not able to find much about automatic fraud detection in literature. Even in present days, the techniques for automatic fraud detection is still limited. But many techniques are available for checking fraud manually or physically. Example for detecting fraud papers in forensic document examine physical, chemical and microscopic characteristics. In infrared [14], microscopic or other low wavelength waves, rays have to passed through papers and check its visibility on other side. In physical examination we also check all the parameters like logos, void pantograph or others important regions. In chemical examination we check through chemicals, whether it is showing different effects for different words or same effect for all words etc. An automated pattern recognition [12] framework and support vector machine [13] also deals with this type of problem. If our alteration is made with ball-point pen strokes in specific area or region, then detecting this type of fraud is discussed in [1]. They have used techniques like multilayer perceptron (MLP)-based feature selection network technique is used. In this method associates an adaptive gate to each feature of the network. Initially the assuming that each gate is closed. And the rate to which a gate is opened determined the goodness of the associated feature. They providing input feature value for each gate by computing the product of the input and the associated value, and that product value is then passed into the next layer features of iterations. And three different classifiers, namely, K-nearest neighbor, MLP and Support vector machines are used. There are techniques name the stereomicroscope, wax lift technique and the distilled water technique to intersecting lines of varying density and color in [8]. Here they use water soluble media and checking the effect of intersection lines in distilled water and identifying the fraud cases. An examination of 56 inks with an argon-ion laser [5] revealed the laser will sometimes stimulate infrared luminescence in inks which did not produce infrared luminescence under the video spectral comparator. This infrared luminescence was always at the higher range, means high wavelength than that found under the Video spectral comparator. This shows that laser-induced infrared luminescence may be useful tool for differencing between inks which appear similar. 15 P a g e

16 Chapter 3: Experimental Method Proposed steps in our experimental method are given below- Figure 3.1: Gray Cheque Image 3.1 Template Based Identification of Important Regions: We are extracting only important region from cheque, image is shown in figure 3.1 which we required for our experiment, and we don t need all other background part of cheque. Format of cheque varies from bank to bank, therefore different template need to be designed for each bank. Even same bank may have various format of cheque. Based on the format an appropriate template is invoked to identify the important portions of the cheque. For example, figure 3.2 shows the template image of the cheque used in figure 3.1 with cheque information. In our proposed method, we are extracting only three important regions pay name, Rupees and signature. Because here we are restrict ourselves for experimenting about detection of additional words using intensity distribution which is possible only in these three regions. In other region we can t add words, we can only make alteration in words using additional stroke like in date, account number and rupees in number portions/regions. 16 P a g e

17 Figure 3.2: Template Image of figure 3.2 with cheque information 3.2 Removing Background Lines: Basically, image represented by a matrix. We are only focusing about pixels which are having value of pen s ink, not background words or lines. So, we need to remove all other unusual extra things which you can see in figure 3.2. For removing background lines we are applying here horizontal projection profile (HPP) method, through this we are able to remove background lines. HPP of pay name portion shows in figure 3.3 of figure 3.2. For calculating HPP we need binary matrix of cheque image so, here we are converting gray image to binary image. In binary image we only have two type of pixels either white (pixel value is 1) or black (pixel value is 0). We specify some threshold value and doing binarization of cheque image. And also excluding unwanted part. Actually, in horizontal projection profile we are calculating number of black pixels in each rows. Figure 3.3 showing that row number and having pick value which is much higher than normal values, from here we are concluding that these rows are denoting as part of background lines. 17 P a g e

18 Figure 3.3: Horizontal Projection Profile of Pay Name Region For removing these background lines, we are replacing its value with white pixels value. See figure 3.4 for binary image after removing background lines and background words. Figure 3.4: Binary image After Removing Background lines and words. 18 P a g e

19 3.4 Detecting the Position of Words: Now next step is detection of position of the words. For detecting the position (starting and ending) of words we need to calculate Vertical Projection Profile (VPP). Basically, in VPP we are counting number of black pixels in each column or vertically of each regions. It shows that if there is continuously some VPP value is coming, it means there is word exist. We wrote algorithms, from that we can detect and store the positions of words. In algorithm we are taking some threshold value for separating each words. If VPP value is coming continuously zero more than threshold value then it shows that there is some space. Generally, the minimum space we keep between two words is around 40 pixels (checked after analyzing some random cheque), according to that we kept this threshold value. In this experiment we put it 30 pixels. See figure 3.6 for VPP diagram of pay name region. Figure 3.5: Vertical Projection Profile of Pay Name Portion From figure 3.5 we can say there are three words in this cheque s in pay name portion. Positions of words of figure 3.4 is shows in table 3.1. In this table first three words from pay name portion, word 4 th to word 6 th below in rupees portion and word 7 th from signature region. 19 P a g e

20 Words/Positions word- 1 word- 2 word- 3 word- 4 word- 5 word- 6 word- 7 Starting Position Ending Position Table 3.1: Positions of Words 3.5 Calculating the Distribution of Intensity Value of All Words: We are applying Behrens-Fisher statistics method, shows in equation (1) used for calculating the intensity distribution of each pair of words. In this example total words are 7 (seven). So, total combination will be 21 (twenty-one), here order won t matter because we are taking absolute difference between intensity value of two words. xi = ( x 1 x ) 2 s s 2 2 (1) n 1 n 2 Where x i is Behrens Fisher Statistics, x 1 and x 2 are mean values of word1 and word2, s 1 and s 2 are standard deviation of word1 and word2 and n1 and n2 are number of pixels in word1 and word2 respectively. After calculating the intensity distribution difference between each pair of words, we are denoting this as a matrix, this matrix is satisfying symmetric property. For making it equivalence matrix we are doing some operations given below. After normalizing we are reducing it by one so it can satisfy reflexivity. Now, it is satisfying reflexivity and symmetric property. If a matrix satisfying symmetric and reflexivity properties then we called it tolerance matrix. Shows in table P a g e

21 Words Word1 Word2 Word3 Word4 Word5 Word6 Word7 Word Word Word Word Word Word Word Table 3.2: Tolerance Matrix 3.6 Converting Tolerance Matrix to Equivalence Matrix: Each tolerance matrix can be convert in equivalence matrix after applying maxmin composition property. In max min composition we required at most n-1 iterations where n is the cardinality (total number of words) of that matrix. See equation (2) for max-min composition- μt(x, z) =max ((μr(x, y), μs(y, z)).....(2) Where, μr(x, y) = μaxb(x, y) = min (μa(x), μb(y)), μs(y, z) = μaxb (y, z) = min (μa (y), μb (z)), t=r ο S, x,y ε R and y,z ε S. 21 P a g e

22 After applying max-min composition we are getting equivalence matrix which shows in table 3.3 Words Word1 Word2 Word3 Word4 Word5 Word6 Word7 Word Word Word Word Word Word Word Table 3.3: Equivalence Matrix 3.7 Classification using λ-cut value: λ-cut is a threshold value, we are choosing some good threshold value for converting binary classification of equivalence matrix which is shows in table 3.3, from this binary matrix we are classifying our words. In binary classification it replace, the value which are less than threshold value by 0 (zero) and value which is greater than the threshold by 1 (one). After classification we are getting binary matrix which shows in table P a g e

23 Words Word1 Word2 Word3 Word4 Word5 Word6 Word7 Word Word Word Word Word Word Word Table 3.4: Binary matrix for classification Matrix in table 3.4 is showing that word-1 and word-2 belong to in one class, word3 belong to in second class, word-4, word-5 and word-6 belong to third class and word- 7 belong to in fourth class or different class. It is showing that words which are belonging in the same class are having similar kind of intensity distribution. And each class have their separate intensity distribution values. 23 P a g e

24 Chapter 4: Analysis of Results There are two type of cheques one is altered and second is non-altered (original) cheques. We are categorizing alteration cheques as positive cases, and nonalteration cheques as negative cases. True positive cases means correctly identified alteration in altered cases. And false positive cases means wrongly identified alteration from un-altered cases. For calculating TPR (true positive rate) and FPR (false positive rate) we use equation (3) and (4) respectively. TPR = And FPR = Correctly Classified Alteration cases Total number of Alteration cases Wrongly Classified Alteration cases Total number of Non Alteration cases (3).... (4) Let s assume total numbers of actual cases is A+B+C+D and from them A+C cheques are non-altered (negative cases) and B+D cheques are altered cheques (positive cases). Where, YES A B Experimental Observation NO C D NO YES Actual cases 24 P a g e

25 C=correctly identifying non-alteration cases from actual non-alteration cheque s. A=wrongly identifying alteration cases from non-alteration cheques. B=correctly identifying alteration cases from alteration cheques. D=wrongly identifying non-alteration cases from alteration cheques. Here YES means alteration cases and NO means non-alteration cases. Now we can write- TPR = FPR = B B+D A A+C and, Here, we are calculating true positive rate and false positive rate for 70 (seventy) cheque images, through this proposed method. We are taking 30 (thirty) cheque images as positive cases (altered cheque images) and 40 (forty) cheque image as negative cases (non-altered cheque images). Then we are calculating true positive rate and false positive rate for different values of λ-cut which is shown in table 2.4. λ-value A B C D TPR FPR Table 4.1: Calculating TPR and FPR 25 P a g e

26 Receiver Operating Characteristic (ROC) curve is shown in figure 4.1. ROC curve is a graphical plot which illustrates the performance of a binary classifier. It is created by plotting the fraction of true positive rate vs. the fraction of false positive rate, at various threshold values of λ. Figure 4.1: FPR vs. TPR Curve shown in figure 4.2 comparing the value between TPR and FPR for various value of λ. And figure illustrating that the growth of FPR is larger than TPR for all cases except few cases. 26 P a g e

27 FPR & TPR TPR FPR Figure 4.2: compression between FPR & TPR for various λ-value 27 P a g e

28 Chapter 5: Conclusion and Future Work Conclusion: By analyzing these results, it can be concluded that if alteration have been done by same color ink, then using only intensity distributions, automatically we can t detect such fraud. Because intensity value dependent on ink of pen, pressure and flow of writing, if these things are almost similar, then intensity value will not show any kind of big difference, due to this it shows many fraud words as non-fraud words and vice versa. It create more complexity and difficulty due to its higher false accepting and false rejecting things, which are important considerations for both customers and bankers. Future Work: In this report, we restricted ourselves only for checking addition words through using only intensity distribution. Using this concept, obtained results are not considered as up to the mark. But if we increase considered parameters then we can able to distinguish fraud words with non-frauds. Example check orientation of words, alignment of words, flow of writing etc. And second is for alteration case (if someone adding only stroke of ink to exist character or words) which is given above in figure 1.3 etc. 28 P a g e

29 References: [1] Rajesh Kumar, Nikhil R. Pal, Bhabatosh Chanda, and j. D. Sharma, Forensic Detection of Fraudulent Alteration in Ball-Point Pen Strokes, IEEE-2012 [2] Ellen, D.: The Scientific Examination of Documents Methods and Techniques, 2nd edn. Taylor and Francis, London (2003) [3] Timothy J. Ross, Fuzzy Logic with Engineering Applications, Third Edition. [4] National Check Fraud Center (A Private Organization in USA) Website, [Online]. Available: [5] Richard A. Horton, B. S. and larry K. Nelson, M.S.B.A, An Evaluation of the Use of Laser-Induced Infrared Luminescence to Differentiate Writing Inks, Journal of forensic science, JFSCA, May [6] Saranya Kota, Rajarshi Pal, Detection Tampered Cheque Image in Cheque Truncation System Using Difference Expansion Based Watermarking. [7] Joel Harris, Development in the Analysis of writing Inks on Questioned Documents, Journal of Forensic Science, [8] Linda R. Taylor, Intersecting Lines as a Means of Fraud Detection, Journal of Forensic science, [9] K. M. Koppenhaver, Attorney s Guide to Document Examination. Westport, CT: Greenwood, [10] Banking Terms and Phrases. Available: [11] Reserve Bank of India, CTS. Available: [12] H.Dasari and C. Bhagvati, Identification of non-black inks using HSV Color spaces, in Proc. ICDAR, 2007, pp [13] R. Kumar, N. R. Pal, B. Chanda, and J. D. Sharma, Detection of fraudulent Alterations in ball-point pen strokes using support vector machines, in Proc. IEEE INDICON, 2009, pp [14] R. A. Merill and E. G. Bartick, Analysis of ball point pen ink by diffuse Reflectance infrared spectrometry, J. Forensic Sci., vol. 37, no. 2, pp , P a g e

30 [15] Mahesh Tejawat, Detection of Tempered Cheque Images in Cheque Truncation System (CTS), IDRBT-2014 [16] J. P. Marques, Pattern Recognition: Concepts, Methods and Applications, 2nd ed. New York: Springer, P a g e