Index IntroductionBasic StudyProposed Methodology and ImplementationIntroductionWith the advent of computers and Internet technology, the possibilities of collecting data and using it for various purposes have exploded. The possibilities are especially enticing when dealing with textual data. Converting the vast amount of data accumulated over the years of human history into digital format is vital for storage, data mining, sentiment analysis, etc., which will only add to the progress of our society . The tool used for this purpose is called OCR. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an original essay Like many other languages, Bengali can also benefit from OCR technology, especially since it is the seventh most spoken language in the world and the speaking population is around 300 million. The Bengali speaking demographic is found primarily in Bangladesh, the Indian states of West Bengal, Assam, Tripura, Andaman and Nicobar Islands and also in the ever-growing diaspora in the United Kingdom (UK), United States (USA), Canada, Middle East -East, Australia, Malaysia etc. So the advancement in the digital usage of the Bengali language is something that encompasses the interest of many countries. Basic Study OCR is the short form of optical character recognition. It is a technology for converting images of printed/handwritten text into a machine-readable, i.e. digital, format. Although OCRs today are predominantly focused on digitizing text, previous OCRs were analogous. The world's first OCR is believed to have been invented by American inventor Charles R. Carey who used an image transmission system using a mosaic of photocells. Later inventions focused on scanning documents to produce multiple copies or to convert them into telegraph code, and then the digital format gradually became more popular. In 1966, IBM's Rochester laboratory developed the IBM 1287, the first scanner capable of reading handwritten numbers. The first commercial OCR was introduced in 1977 by Caere Corporation. OCR began to be made available online as a service (WebOCR) in 2000 on a variety of platforms via cloud computing. Based on its method, OCR can be divided into two types: Online OCR (not to be confused with "online" in Internet technology) involves automatic conversion of text as it is written on a special digitizer or PDA, where a sensor detects pen tip movements and pen up/down movement. This type of data is known as digital ink and can be considered a digital representation of handwriting. The resulting signal is converted into letter codes that can be used within computers and word processing applications. Offline OCR scans an image as a whole and does not handle stroke orders. It is a kind of image processing as it tries to recognize character patterns in certain image files. Online OCR can only process written texts in real time, while offline OCR can process images of both handwritten and printed texts, and no special device is needed. Most of the successful research in Bangla OCR so far has been conducted for printed text, although researchers are gradually focusing more on handwritten text recognition. Sanchez and Pal proposed a classical line-based approach for continuous Bangla handwriting recognition based on hidden Markov models and n-gram models. They used both word-based LM (language model) and LMcharacter-based for their experiment and found better results with word-based LM. Garain, Mioulet, Chaudhuri, Chatelain, and Paquet developed a recurrent neural network model to recognize character-level unconstrained Bengali writing. They used a BLSTM-CTC-based recognizer on a dataset consisting of 2338 unconstrained Bangla handwritten lines, which is approximately 21000 words in total. Instead of horizontal segmentation, they chose vertical segmentation by classifying words into “semi-ortho syllables.” Their experiment produced an accuracy of 75.40% without any post-processing. Hasnat, Chowdhury and Khan developed a Tesseract-based OCR for the Bangla script they used on the printed document. They achieved a maximum accuracy of 93% on clean printed documents and a minimum accuracy of 70% on screen-printed images. It is evident that this is very sensitive to variations in letter shapes and is not very favorable for use in Bengali script character recognition. Chowdhury and Rahman proposed an optimal neural network setup for Bangla handwritten number recognition which consisted of two convolution layers with Tanh activation, a hidden layer with Tanh activation, and an output layer with softmax activation. To recognize the 9 Bengali numeric characters, they used a dataset of 70,000 samples with an error rate of 1.22% to 1.33%. Purkayastha, Datta and Islam also used convolutional neural network for Bengali handwritten character recognition. They are the first to work on composite Bengali handwritten characters. Their recognition experiment also included numeric characters and alphabets. They achieved 98.66% accuracy on numbers and 89.93% accuracy on almost all Bengali characters (80 classes). Some projects have been developed for Bangla OCR, it is notable that none of them work on handwritten text: BanglaOCR is an open source OCR developed by Hasnat, Chowdhury and Khan that uses the Google Tesseract engine for character recognition and works on printed documents, as discussed in Section 3.1Puthi OCR aka GIGA Text Reader is a cross-platform Bangla OCR application developed by Giga TECH. This application works on printed documents written in Bengali, English and Hindi. The Android app version is free to download, but the desktop version and enterprise version require payment. Chitrolekha is another Bengali OCR that uses Google Tesseract and Open CV Image Library. The application is free and maybe it was available in Google Play Store in the past, but at the moment (as of 15.07.2018) it is no longer available.i2OCR is a multilingual OCR that supports more than 60 languages including Bengali.Proposed methodology and implementationDeep CNN stands for Deep Convolutional Neural Network. First let's understand what a convolution neural network (CNN) is. Neural networks are tools used in machine learning inspired by the architecture of the human brain. The most basic version of the artificial neuron is called a perceptron which makes a decision based on inputs and probabilities weighted against the threshold value. A neural network consists of interconnected perceptrons whose connection may differ depending on the various configurations. The simplest perceptron topology is the feed-forward network composed of three layers: input layer, hidden layer, and output layer. Deep neural networks have more than one hidden layer. So, a deep CNN is a convolutional neural network with more than one hidden layer. Now we come to the matter of convolutional neural network. While neural networks are inspired by the human brain, CNNs are another type of..
tags