Internatonal Journal of Scence and Appled echnology MALAY ARABIC LEERS RECOGNIION AND SEARCHING *Correspondng author Sukamto 1, Elfzar 2*, Mara Erna 3 1 Informaton System Department, Unversty of Rau, Indonesa amto_s@yahoo.co.d 2 Informaton System Department, Unversty of Rau, Indonesa elfzar@lecturer.unr.ac.d 3 Faculty of Educaton, Unversty of Rau, Indonesa maraerna@lecturer.unr.ac.d Abstract: hs paper s ntended to recognze and search the Malay Arabc word from a dgtal Malay Arabc manuscrpt. he key words used are Latn letters. Usng an nput mage, ths research reads every pxel values and then do segmentaton on row and column of every sentences n the manuscrpt. It s done to separate every letters n each lne. Usng some pxel values of the mage n the tranng phase, by performng feature extracton and classfcaton usng Prncpal Component Analyss, the EgenLetter s calculated. hs value s then compared wth ImageLetter obtaned from each letter to be recognzed. he smallest dfference between them (Eucldean dstance value) s the desred results. hese results are then stored along wth the mage of the row segmentaton. Furthermore, the process of Malay Arabc word searchng can be done by usng the keyword of Latn letters. hs study yelds success rate of 91.56% for recognton and 82.08% for Malay Arabc word searchng. Keywords: Malay Arabc, Letter recognton, Word searchng. 1. Introducton Besdes lack of nformaton avalable to the publc, some cultural objects of Rau Malay are very rsky for damage factor and theft. Some antqutes and hstorcal hertages n the Museum of Sang Nla Utama, Rau Provnce, have been lost [9]. herefore, the process of dgtalzaton for the objects of the Malay culture s needed as backup nformaton for the physcal objects. It ncludes dgtalzaton of cultural object of Malay Arabc manuscrpts. Researches on the recognton of Malay Arabc letters are just recognzng letter by letter. he recognton of each Malay Arabc letter can be done based on the hstogram values [8]. Even the voce recognton of rectng a Malay Arabc letter has been done [6]. However, research on the recognton of the Malay Arabc scrpt (a combnaton of letters or words) has not been done. hus, we need a method of how the computer can recognze each letter consttutng a Malay Arabc scrpt derved from a dgtal mage of the manuscrpt. In recognton of a pattern that can be an object or character, there are many methods that can be used. Some of them are Matchng [5], Optmum Value [3] and Prncpal Component Analyss (PCA) [1]. he frst method stll has some drawbacks that do not provde satsfactory results, whle the method of Optmum Value character recognton although the results are qute relable, but t s more approprate for Latn letters and numbers. Dfferent to PCA, ths method can provde an object recognton results wth hgher precson. Prncpal Component Analyss (PCA) s a technque used to fnd patterns (objects and characters). hs technque can reduce the varaton whle mantanng the necessary nformaton, so that the remanng varaton s ndeed a varaton of the most promnent and representng the exstng features. he varaton reducton process s accomplshed by reducng the area of the matrx whch has a value rangng from the weakest [7]. Another advantage of the PCA s able to create applcatons that use t to be faster because the data used n the process has been reduced [4]. Based on the results obtaned from the recognton, ths study further provdes searchng process for the Malay Arabc word based on the keywords entered. Keywords used n ths study are word wth Latn letters. IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 45
Internatonal Journal of Scence and Appled echnology hs paper makes users understand the Malay Arabc meanng even though they do not have knowledge about how to read t. Furthermore, users may search for a certan word contaned n that manuscrpt just by typng a Latn keyword. he fndng may also be a materal for the Dstrbuted Vrtual Envronment (DVE) applcaton [10]. he next sesson dscusses about the EgenLetter and ImageLetter. Both of them are man varables used to recognze each Malay Arabc letter. Research methodology and the results are presented n secton 3 and 4, respectvely. Fnally, secton 5 concludes the research. 2. EgenLetter & ImageLetter Prncpal Component Analyss (PCA) uses pxel ntensty values of the mage n a matrx [4]: 0,0 1,0 N 1,0 0,1 1,1 N 1,1 0, N 1 1, N 1 N 1, N 1 (1) where N = length and wdth of the mage, = ntensty value of the pxels, and = -th mage, = 1, 2,..., M. Hence, the matrx n equaton (1) s converted nto a measurng 1x N 2. When there M mages wll be traned n the tranng phase, t wll provde the matrx: 0 1 M 1 (2) he dstance dfference from each nput mage wth the average value of the mage ( ) s calculated by subtractng the value of each mage wth the average value of the mage: (3) where = the average value of the mage. Usng equaton (3), then the covarance matrx s calculated by the formula: C A A (4) wth C = the covarance matrx, and A = matrx of dfference. Wth the acquston of covarance matrx value, then we can calculate the egen values: I C 0 det, wth = egen values, I = the dentty matrx, and C s the covarance matrx. Egenvectors can be determned usng equaton (5). C 0 I (5) wth = egenvectors. Results of egenvectors must be sorted to get the greatest characterstc. Eucldean dstance s a technque used n mage matchng. he dfference value generated by Eucldean dstance wll show the smlartes between two mages used as a comparson [7]. Eucldean dstance value obtaned at the end of object recognton process s calculated usng equaton (6). IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 46
Internatonal Journal of Scence and Appled echnology mageletter egenletter (6) wth = Eucledan dstance for the -th mage. EgenLetter and mageletter value calculated usng equaton (7), and equaton (8). egenletter = egenvector. (7) wth egenvector s sorted. 3. Methodology mageletter = egenvector. (8) hs study uses multple mages of Malay Arabc manuscrpt. he MatLab software and PHP scrpt are used n ths study for applcaton codng. he frst phase conducted n ths research s dgtzng the Malay Arabc scrpt. he second s pre-processng the dgtal mages that covers actvty of omttng nose, ncreasng sharpness, and others so that the recognton of nput data could be done [2]. Usng the output from the last phase, the next phase s recognzng Malay Arabc scrpt. hs process uses PCA method. he parameter used for recognzng every letter s characterstcs that have already extracted prevously. hs parameter s calculated based on egenvalue and egenvector from every letters. he fourth phase s determnng the search method of Malay Arabc words usng keyword n Latn letters. In detal, there are two man sub-actvtes carry out n ths stage. Frst, determnng the applcaton nterface for readng keyword to search any word n Malay Arabc scrpt. Keywords used n ths process s words n Indonesan language. he second s determnng the method for lne and column segmentaton of Malay Arabc scrpt. It s requred to extract every exstng letters n Malay Arabc scrpt. 4. Result and Dscusson he recognton results are stored n a database table wth data structure as llustrated by able 1. able 1. Data structure Attrbute ype Descrpton row_d Integer Row segmentaton d armel Strng Drectory path of row segmentaton mage latn Strng Recognton result he user nterface of word searchng s llustrated by Fgure 1. When the Search button s clcked, the mages and Latn letters are dsplayed as the results. Example of the nput mages used n the research can be seen n Fgure 2. he recognton process begns wth the Malay Arabc scrpt segmentaton lnes (Fgure 3). Further, the process s then contnued by markng column to separate each Malay Arabc letter contaned n each lne. IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 47
Internatonal Journal of Scence and Appled echnology Keyword Search Result n Image Result n Latn text Fgure 1. User nterface Fgure 2. Example of nput mage he results of segmentaton process are used as nput to the process of recognzng Malay Arabc letters. In the recognton process, the frst stage s feature extracton and classfcaton of the letters to be recognzed. hs whole process s summarzed n the use of Prncpal Component Analyss (PCA) methods. Fgure 3. Lne segmentaton he last stage s to calculate Eucldean dstance usng equaton (6), n an attempt to dentfy a requred letter based on the classfcaton that has been obtaned. 4.1. Recognton Process he results of recognton process of each Malay Arabc for four groups of nput mages are shown n able 2. he mage nput to the frst group conssts of the mages of each Malay Arabc IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 48
Internatonal Journal of Scence and Appled echnology letter. he second group untl the fourth group consst of Malay Arabc scrpt mages wth varyng number of words. able 2. Malay Arabc recognton Group Average ED Percentage of success (%) I 2.72x10-14 100 II 449.09 87.50 III 251.69 88.57 IV 398.70 90.16 Average 91.56 he results of the recognton process n the able 2 clearly show that the hghest percentage of success obtaned from the nput mage of the frst group. It may happen snce mages n the frst group are used n the tranng phase. From all groups, the average success of the Malay Arabc recognton s 91.56%. 4.2. Word Searchng Fgure 4. Word searchng Word searchng uses the keywords of Latn letters as llustrated by Fgure 4. hs process s based on Latn letters obtaned from the stage of Malay Arabc recognton. he recognton of each row segmentaton s stored n a table. he output of a search dsplays the Latn letters and ts row mage segmentaton. Results of the word searchng process are shown n able 3. From able 3, t can be noted that the average success of word searchng s 82.08%. able 3. Results of word searchng Group Percentage of success I 100 II 75.00 III 83.33 IV 70.00 Average 82.08 IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 49
Internatonal Journal of Scence and Appled echnology 4. Concluson he Malay Arabc can be recognzed by usng the PCA method. he result of ths process can then be used to search any word n the Malay Arabc manuscrpt by usng the keyword of Latn letters. One can use ths applcaton to learn Malay Arabc manuscrpt even though he has no knowledge about the Malay Arabc. he Malay Arabc word searchng success rate depends on the level of success of Malay Arabc recognton tself. he hgher the rate of success of letters recognton, the hgher the success rate of Malay Arabc word searchng. Acknowledgment. hs research was supported by Mnstry of Hgher Educaton of Indonesa under Unggulan Perguruan ngg Research Grant n Year 2015. REFERENCES 1. AFRIZAL, A., ELFIZAR, Facal mage matchng based on Egenfaces values, Semnar Nasonal Matematka. Palembang, 2008. 2. ELFIZAR, NASIR, M., Camera moton reconstructon based on mage sequences, Jurnal SAINEK UNAG Surabaya: Vol.10, No.2, 2009, pp. 121-127. 3. ELFIZAR, AFLINA, A., Usng Optmum Value n Recognzng Vehcle Plat Number, Proceedng the UKM-UNRI Internatonal Conference. Pekanbaru. ISBN 978-979-1222-46-4, 2008, pp.454-456. 4. ELFIZAR, Malay Arabc recognton usng prncpal component analyss (PCA), Proceedng of Semnar Unr UKM Malaysa, Kuala Lumpur Malaysa, 2010. 5. HORN, B.K.P., Robot Vson, McGraw-Hll. New York, 1998. 6. ISMAIL, SALIZA, AHMAD, A. M., Recurrent Neural Network wth Backpropagaton through me Algorthm for Arabc Recognton. Proceedngs 18th European Smulaton Multconference, Graham Horton ISBN 3-93150-35-4, 2004. 7. SHLENS, J., A utoral on Prncpal Component Analyss, http://www.mathworks.com. Accessed on 10 Februar 2015. 8. ROFEAH, N., SEYED MOHAMED BUHARI M. I,. Hstogram Setup to Recognze Arabc Letters, Internatonal Conference on Artfcal Intellgence, Las Vegas, USA, 2002. 9. RIAU POS, Museum sang nla utama dbobol malng, Haran Rau Pos, 3 Januar 2014. 10. ELFIZAR, BABA, M.S., HERAWAN,., A Large Scale Dstrbuted Vrtual Envronment Archtecture, Studes n Informatcs and Control, vol. 24, ssue 2, 2015, pp.159-170. IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 50