You will prepare and submit a term paper on Urdu OCR. Your paper should be a minimum of 1500 words in length. Character Recognition (OCR) refers to the areas or branch of computer science that engages reading other f text from paper as well as translating the images into a structure that computer is able to recognize (for instance converting into ASCII codes). An Optical Character Recognition system allows us to take a magazine or book or article, feed it straightly into an electronic computer data file, moreover edit the file by means of a word processor (webopedia, 2009). The Urdu language is similar to Arabic, which is used widely in different countries. There is no such work was done previously. This research has offered a better reorganization of Urdu script. Pal & Sarkar (2003) also developed a prototype of the system that has attained 97.8% character level accuracy on average (Pal & Sarkar, 2003).
All Optical Character Recognition systems comprise an optical scanner intended for extracting or reading the g text, as well as really complicated/sophisticated software intended for analyzing images. The majority of Optical Character Recognition systems utilize a blend of hardware in (particular circuit boards) as well as software to identify characters, while a number of low-priced systems perform it completely through software. Superior Optical Character Recognition systems are able to read the d text in the huge variety of fonts. however, they still have trouble through handwritten text (webopedia, 2009). The power and effectiveness of Optical Character Recognition systems is huge since they facilitate users to control the power of computers systems to review printed documents. Optical Character Recognition is previously being utilized extensively in the official profession, education, research, and print media (webopedia, 2009). But there is less amount of work done on recognition of other languages (i.e Arabic, Hindi, Urdu).
The Urdu script is a complex language script. The total number of alphabets in Urdu is 39. In this language, we have 10 numerals characters. .