MALAY ARABIC LETTERS RECOGNITION AND SEARCHING

Similar documents
A Group Decision Making Method for Determining the Importance of Customer Needs Based on Customer- Oriented Approach

Incremental online PCA for automatic motion learning of eigen behaviour. Xianhua Jiang and Yuichi Motai*

A SIMULATION STUDY OF QUALITY INDEX IN MACHINE-COMPONF~T GROUPING

MULTIPLE FACILITY LOCATION ANALYSIS PROBLEM WITH WEIGHTED EUCLIDEAN DISTANCE. Dileep R. Sule and Anuj A. Davalbhakta Louisiana Tech University

Consumption capability analysis for Micro-blog users based on data mining

A Two-Echelon Inventory Model for Single-Vender and Multi-Buyer System Through Common Replenishment Epochs

Application of Ant colony Algorithm in Cloud Resource Scheduling Based on Three Constraint Conditions

Adaptive Noise Reduction for Engineering Drawings Based on Primitives and Noise Assessment

Experiments with Protocols for Service Negotiation

Extended Abstract for WISE 2005: Workshop on Information Systems and Economics

The ranks of Indonesian and Japanese industrial sectors: A further study

Fast Algorithm for Prediction of Airfoil Anti-icing Heat Load *

Comparison of robust M estimator, S estimator & MM estimator with Wiener based denoising filter for gray level image denoising with Gaussian noise

A study on Fast Predicting the Washability Curve of Coal

Spatial difference of regional carbon emissions in China

Experimental Validation of a Suspension Rig for Analyzing Road-induced Noise

An Analysis on Stability of Competitive Contractual Strategic Alliance Based on the Modified Lotka-Voterra Model

Development and production of an Aggregated SPPI. Final Technical Implementation Report

1 Basic concepts for quantitative policy analysis

Analysis Online Shopping Behavior of Consumer Using Decision Tree Leiyue Yao 1, a, Jianying Xiong 2,b

Study on Productive Process Model Basic Oxygen Furnace Steelmaking Based on RBF Neural Network

Sources of information

Customer segmentation, return and risk management: An emprical analysis based on BP neural network

Planning of work schedules for toll booth collectors

FATIGUE SAFETY MONITORING AND ASSESSMENT OF SHORT AND MEDIUM SPAN CONCRETE GIRDER BRIDGES

Prediction algorithm for users Retweet Times

Band Selection Using Clustering Technique for Dimensionality Reduction in Hyper spectral Image

Calculation and Prediction of Energy Consumption for Highway Transportation

A Scenario-Based Objective Function for an M/M/K Queuing Model with Priority (A Case Study in the Gear Box Production Factory)

6.4 PASSIVE TRACER DISPERSION OVER A REGULAR ARRAY OF CUBES USING CFD SIMULATIONS

Evaluating Clustering Methods for Multi-Echelon (r,q) Policy Setting

A New Artificial Fish Swarm Algorithm for Dynamic Optimization Problems

Multi-Modular Coordination Control of HTR-PM600 Plant

EXPERIMENTAL DETERMINATION OF THERMAL CHARACTERISTICS OF MUNICIPAL SOLID WASTE

Modeling and Simulation for a Fossil Power Plant

Integration of Rules from a Random Forest

Pricing for Resource Allocation in Cloud Computing

RVFL-Based Optical Fiber Intrusion Signal Recognition With Multi-Level Wavelet Decomposition as Feature

Application of a PCA based water quality classification method in water. quality assessment in the Tongjiyan Irrigation Area, China

Simulation of Steady-State and Dynamic Behaviour of a Plate Heat Exchanger

Appendix 6.1 The least-cost theorem and pollution control

Semantic Matchmaking for Job Recruitment: An Ontology-Based Hybrid Approach

Varunraj Valsaraj, Kara Kockelman, Jennifer Duthie, and Brenda Zhou University of Texas at Austin. Original Version: September 2007.

Driving Factors of SO 2 Emissions in 13 Cities, Jiangsu, China

International Trade and California s Economy: Summary of the Data

Wei Zheng College of Science, Hebei North University, Zhangjiakou , Hebei, China

Characteristics of Cascade and C3MR Cycle on Natural Gas Liquefaction Process

Characteristics of Cascade and C3MR Cycle on Natural Gas Liquefaction Process

Numerical Analysis about Urban Climate Change by Urbanization in Shanghai

The Estimation of Thin Film Properties by Neural Network

ASSESSMENT OF THE IMPACT OF DECAY CORRECTION IN THE DOSE-TO- CURIE METHOD FOR LONG-TERM STORED RADIOACTIVE WASTE DRUMS

Research on the Process of Runoff and Sediment-production in the Shunjiagou Small Watershed by Applying Automatic Measurement System

Availability based Scoring Model for Resource Grouping in Desktop Grid Computing

LIFE CYCLE ENVIRONMENTAL IMPACTS ASSESSMENT FOR RESIDENTIAL BUILDINGS IN CHINA

Technical, Allocative and Economic Efficiencies of Potato Production in Iran

Construction of Control Chart Based on Six Sigma Initiatives for Regression

Volume 30, Issue 4. Who likes circus animals?

1991), a development of the BLAST program which integrates the building zone energy balance with the system and central plant simulation.

A Split-Step PSO Algorithm in Prediction of Water Quality Pollution

Adaptive Neuro Fuzzy Inference System (ANFIS) for Prediction of Groundwater Quality Index in Matar Taluka and Nadiad Taluka

Identifying Factors that Affect the Downtime of a Production Process

Economic Dispatch in 150 KV Sulselrabar Electrical System Using Ant Colony Optimization

Evaluating The Performance Of Refrigerant Flow Distributors

A Batch Splitting Job Shop Scheduling Problem with bounded batch sizes under Multiple-resource Constraints using Genetic Algorithm

Determination of the Relationship between Biodiversity and the Trophic State of Wahiawa Reservoir, Phase II

Researches on the best-fitted talents recommendation algorithm

Optimum Generation Scheduling for Thermal Power Plants using Artificial Neural Network

Analyses Based on Combining Similar Information from Multiple Surveys

Content-Based Cross-Domain Recommendations Using Segmented Models

U.S. Commercial Buildings Energy Consumption and Intensity Trends: A Decomposition Approach

Market Segmentation of Inbound Business Tourists to Thailand by Binding of Unsupervised and Supervised Learning Techniques

THE STUDY OF GLOBAL LAND SUITABILITY EVALUATION: A CASE OF POTENTIAL PRODUCTIVITY ESTIMATION FOR WHEAT

Qiang Yang and Hong Cheng

Data-Driven Fault Diagnosis of Shaft Furnace Roasting Processes Using Reconstruction and Reconstruction-Based Contribution Approaches

The Effect of Outsourcing on the Change of Wage Share

Experimental design methodologies for the identification of Michaelis- Menten type kinetics

Ninth International Water Technology Conference, IWTC9 2005, Sharm El-Sheikh, Egypt 371

Journal of Applied Research and Technology ISSN: Centro de Ciencias Aplicadas y Desarrollo Tecnológico.

Steady State Load Shedding to Prevent Blackout in the Power System using Artificial Bee Colony Algorithm

Ternary fission of 250,252 Cf isotopes with 3 H and 6 He as light charged particle

A Similarity-Based Approach for the All-Time Demand Prediction of New Automotive Spare Parts

CCDEA: Consumer and Cloud DEA Based Trust Assessment Model for the Adoption of Cloud Services

Correlation Network Analysis on Worker s Behaviour and Safety Culture: An Experience in Manufacturing Industry

Household Budget and Calorie Consume of Livestock Products: Evidence from Indonesia SUMMARY

MODELING OF RIVER ICE BREAKUP DATE AND THICKNESS IN THE LENA RIVER

FIN DESIGN FOR FIN-AND-TUBE HEAT EXCHANGER WITH MICROGROOVE SMALL DIAMETER TUBES FOR AIR CONDITIONER

Determination of Housing Price in Taipei City Using Fuzzy Adaptive Networks

Comparative Advantage, Information and the Allocation of. Workers to Tasks: Evidence from an Agricultural Labor Market. Andrew D. Foster.

PREDICTING THE WAGES OF EMPLOYEES USING SOCIO-ECONOMIC AND DEMOGRAPHIC DETERMINANTS: A CASE OF PAKISTAN

Program Phase and Runtime Distribution-Aware Online DVFS for Combined Vdd/Vbb Scaling

Chapter 11 INTERNATIONAL PRODUCTIVITY COMPARISONS AT THE INDUSTRY LEVEL. Hans Gersbach Alfred Weber-Institut, Universität Heidelberg 1

Optimal Configuration for Distributed Generations in Micro-grid System Considering Diesel as the Main Control Source

1998 IEEE. Reprinted with permission.

Application of Variable Selection Method Based on Genetic Algorithm in Marine Enzyme Fermentation

Multi Objective Optimum Resource Scheduling for Cloud Computing Networks

International Trade and California Employment: Some Statistical Tests

Spot Welding Parameter Optimization To Improve Weld Characteristics For Dissimilar Metals

Potato Marketing Factors Affecting Organic and Conventional Potato Consumption Patterns

MODELLING AND SIMULATION OF TEAM EFFECTIVENESS EMERGED FROM MEMBER-TASK INTERACTION. Shengping Dong Bin Hu Jiang Wu

Willingness to Pay for the Quality of Drinking Water

Transcription:

Internatonal Journal of Scence and Appled echnology MALAY ARABIC LEERS RECOGNIION AND SEARCHING *Correspondng author Sukamto 1, Elfzar 2*, Mara Erna 3 1 Informaton System Department, Unversty of Rau, Indonesa amto_s@yahoo.co.d 2 Informaton System Department, Unversty of Rau, Indonesa elfzar@lecturer.unr.ac.d 3 Faculty of Educaton, Unversty of Rau, Indonesa maraerna@lecturer.unr.ac.d Abstract: hs paper s ntended to recognze and search the Malay Arabc word from a dgtal Malay Arabc manuscrpt. he key words used are Latn letters. Usng an nput mage, ths research reads every pxel values and then do segmentaton on row and column of every sentences n the manuscrpt. It s done to separate every letters n each lne. Usng some pxel values of the mage n the tranng phase, by performng feature extracton and classfcaton usng Prncpal Component Analyss, the EgenLetter s calculated. hs value s then compared wth ImageLetter obtaned from each letter to be recognzed. he smallest dfference between them (Eucldean dstance value) s the desred results. hese results are then stored along wth the mage of the row segmentaton. Furthermore, the process of Malay Arabc word searchng can be done by usng the keyword of Latn letters. hs study yelds success rate of 91.56% for recognton and 82.08% for Malay Arabc word searchng. Keywords: Malay Arabc, Letter recognton, Word searchng. 1. Introducton Besdes lack of nformaton avalable to the publc, some cultural objects of Rau Malay are very rsky for damage factor and theft. Some antqutes and hstorcal hertages n the Museum of Sang Nla Utama, Rau Provnce, have been lost [9]. herefore, the process of dgtalzaton for the objects of the Malay culture s needed as backup nformaton for the physcal objects. It ncludes dgtalzaton of cultural object of Malay Arabc manuscrpts. Researches on the recognton of Malay Arabc letters are just recognzng letter by letter. he recognton of each Malay Arabc letter can be done based on the hstogram values [8]. Even the voce recognton of rectng a Malay Arabc letter has been done [6]. However, research on the recognton of the Malay Arabc scrpt (a combnaton of letters or words) has not been done. hus, we need a method of how the computer can recognze each letter consttutng a Malay Arabc scrpt derved from a dgtal mage of the manuscrpt. In recognton of a pattern that can be an object or character, there are many methods that can be used. Some of them are Matchng [5], Optmum Value [3] and Prncpal Component Analyss (PCA) [1]. he frst method stll has some drawbacks that do not provde satsfactory results, whle the method of Optmum Value character recognton although the results are qute relable, but t s more approprate for Latn letters and numbers. Dfferent to PCA, ths method can provde an object recognton results wth hgher precson. Prncpal Component Analyss (PCA) s a technque used to fnd patterns (objects and characters). hs technque can reduce the varaton whle mantanng the necessary nformaton, so that the remanng varaton s ndeed a varaton of the most promnent and representng the exstng features. he varaton reducton process s accomplshed by reducng the area of the matrx whch has a value rangng from the weakest [7]. Another advantage of the PCA s able to create applcatons that use t to be faster because the data used n the process has been reduced [4]. Based on the results obtaned from the recognton, ths study further provdes searchng process for the Malay Arabc word based on the keywords entered. Keywords used n ths study are word wth Latn letters. IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 45

Internatonal Journal of Scence and Appled echnology hs paper makes users understand the Malay Arabc meanng even though they do not have knowledge about how to read t. Furthermore, users may search for a certan word contaned n that manuscrpt just by typng a Latn keyword. he fndng may also be a materal for the Dstrbuted Vrtual Envronment (DVE) applcaton [10]. he next sesson dscusses about the EgenLetter and ImageLetter. Both of them are man varables used to recognze each Malay Arabc letter. Research methodology and the results are presented n secton 3 and 4, respectvely. Fnally, secton 5 concludes the research. 2. EgenLetter & ImageLetter Prncpal Component Analyss (PCA) uses pxel ntensty values of the mage n a matrx [4]: 0,0 1,0 N 1,0 0,1 1,1 N 1,1 0, N 1 1, N 1 N 1, N 1 (1) where N = length and wdth of the mage, = ntensty value of the pxels, and = -th mage, = 1, 2,..., M. Hence, the matrx n equaton (1) s converted nto a measurng 1x N 2. When there M mages wll be traned n the tranng phase, t wll provde the matrx: 0 1 M 1 (2) he dstance dfference from each nput mage wth the average value of the mage ( ) s calculated by subtractng the value of each mage wth the average value of the mage: (3) where = the average value of the mage. Usng equaton (3), then the covarance matrx s calculated by the formula: C A A (4) wth C = the covarance matrx, and A = matrx of dfference. Wth the acquston of covarance matrx value, then we can calculate the egen values: I C 0 det, wth = egen values, I = the dentty matrx, and C s the covarance matrx. Egenvectors can be determned usng equaton (5). C 0 I (5) wth = egenvectors. Results of egenvectors must be sorted to get the greatest characterstc. Eucldean dstance s a technque used n mage matchng. he dfference value generated by Eucldean dstance wll show the smlartes between two mages used as a comparson [7]. Eucldean dstance value obtaned at the end of object recognton process s calculated usng equaton (6). IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 46

Internatonal Journal of Scence and Appled echnology mageletter egenletter (6) wth = Eucledan dstance for the -th mage. EgenLetter and mageletter value calculated usng equaton (7), and equaton (8). egenletter = egenvector. (7) wth egenvector s sorted. 3. Methodology mageletter = egenvector. (8) hs study uses multple mages of Malay Arabc manuscrpt. he MatLab software and PHP scrpt are used n ths study for applcaton codng. he frst phase conducted n ths research s dgtzng the Malay Arabc scrpt. he second s pre-processng the dgtal mages that covers actvty of omttng nose, ncreasng sharpness, and others so that the recognton of nput data could be done [2]. Usng the output from the last phase, the next phase s recognzng Malay Arabc scrpt. hs process uses PCA method. he parameter used for recognzng every letter s characterstcs that have already extracted prevously. hs parameter s calculated based on egenvalue and egenvector from every letters. he fourth phase s determnng the search method of Malay Arabc words usng keyword n Latn letters. In detal, there are two man sub-actvtes carry out n ths stage. Frst, determnng the applcaton nterface for readng keyword to search any word n Malay Arabc scrpt. Keywords used n ths process s words n Indonesan language. he second s determnng the method for lne and column segmentaton of Malay Arabc scrpt. It s requred to extract every exstng letters n Malay Arabc scrpt. 4. Result and Dscusson he recognton results are stored n a database table wth data structure as llustrated by able 1. able 1. Data structure Attrbute ype Descrpton row_d Integer Row segmentaton d armel Strng Drectory path of row segmentaton mage latn Strng Recognton result he user nterface of word searchng s llustrated by Fgure 1. When the Search button s clcked, the mages and Latn letters are dsplayed as the results. Example of the nput mages used n the research can be seen n Fgure 2. he recognton process begns wth the Malay Arabc scrpt segmentaton lnes (Fgure 3). Further, the process s then contnued by markng column to separate each Malay Arabc letter contaned n each lne. IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 47

Internatonal Journal of Scence and Appled echnology Keyword Search Result n Image Result n Latn text Fgure 1. User nterface Fgure 2. Example of nput mage he results of segmentaton process are used as nput to the process of recognzng Malay Arabc letters. In the recognton process, the frst stage s feature extracton and classfcaton of the letters to be recognzed. hs whole process s summarzed n the use of Prncpal Component Analyss (PCA) methods. Fgure 3. Lne segmentaton he last stage s to calculate Eucldean dstance usng equaton (6), n an attempt to dentfy a requred letter based on the classfcaton that has been obtaned. 4.1. Recognton Process he results of recognton process of each Malay Arabc for four groups of nput mages are shown n able 2. he mage nput to the frst group conssts of the mages of each Malay Arabc IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 48

Internatonal Journal of Scence and Appled echnology letter. he second group untl the fourth group consst of Malay Arabc scrpt mages wth varyng number of words. able 2. Malay Arabc recognton Group Average ED Percentage of success (%) I 2.72x10-14 100 II 449.09 87.50 III 251.69 88.57 IV 398.70 90.16 Average 91.56 he results of the recognton process n the able 2 clearly show that the hghest percentage of success obtaned from the nput mage of the frst group. It may happen snce mages n the frst group are used n the tranng phase. From all groups, the average success of the Malay Arabc recognton s 91.56%. 4.2. Word Searchng Fgure 4. Word searchng Word searchng uses the keywords of Latn letters as llustrated by Fgure 4. hs process s based on Latn letters obtaned from the stage of Malay Arabc recognton. he recognton of each row segmentaton s stored n a table. he output of a search dsplays the Latn letters and ts row mage segmentaton. Results of the word searchng process are shown n able 3. From able 3, t can be noted that the average success of word searchng s 82.08%. able 3. Results of word searchng Group Percentage of success I 100 II 75.00 III 83.33 IV 70.00 Average 82.08 IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 49

Internatonal Journal of Scence and Appled echnology 4. Concluson he Malay Arabc can be recognzed by usng the PCA method. he result of ths process can then be used to search any word n the Malay Arabc manuscrpt by usng the keyword of Latn letters. One can use ths applcaton to learn Malay Arabc manuscrpt even though he has no knowledge about the Malay Arabc. he Malay Arabc word searchng success rate depends on the level of success of Malay Arabc recognton tself. he hgher the rate of success of letters recognton, the hgher the success rate of Malay Arabc word searchng. Acknowledgment. hs research was supported by Mnstry of Hgher Educaton of Indonesa under Unggulan Perguruan ngg Research Grant n Year 2015. REFERENCES 1. AFRIZAL, A., ELFIZAR, Facal mage matchng based on Egenfaces values, Semnar Nasonal Matematka. Palembang, 2008. 2. ELFIZAR, NASIR, M., Camera moton reconstructon based on mage sequences, Jurnal SAINEK UNAG Surabaya: Vol.10, No.2, 2009, pp. 121-127. 3. ELFIZAR, AFLINA, A., Usng Optmum Value n Recognzng Vehcle Plat Number, Proceedng the UKM-UNRI Internatonal Conference. Pekanbaru. ISBN 978-979-1222-46-4, 2008, pp.454-456. 4. ELFIZAR, Malay Arabc recognton usng prncpal component analyss (PCA), Proceedng of Semnar Unr UKM Malaysa, Kuala Lumpur Malaysa, 2010. 5. HORN, B.K.P., Robot Vson, McGraw-Hll. New York, 1998. 6. ISMAIL, SALIZA, AHMAD, A. M., Recurrent Neural Network wth Backpropagaton through me Algorthm for Arabc Recognton. Proceedngs 18th European Smulaton Multconference, Graham Horton ISBN 3-93150-35-4, 2004. 7. SHLENS, J., A utoral on Prncpal Component Analyss, http://www.mathworks.com. Accessed on 10 Februar 2015. 8. ROFEAH, N., SEYED MOHAMED BUHARI M. I,. Hstogram Setup to Recognze Arabc Letters, Internatonal Conference on Artfcal Intellgence, Las Vegas, USA, 2002. 9. RIAU POS, Museum sang nla utama dbobol malng, Haran Rau Pos, 3 Januar 2014. 10. ELFIZAR, BABA, M.S., HERAWAN,., A Large Scale Dstrbuted Vrtual Envronment Archtecture, Studes n Informatcs and Control, vol. 24, ssue 2, 2015, pp.159-170. IJSA, Vol. 1, No. 1, December 2016 http://jsat.unr.ac.d 50