Arabic-Handwritten-Characters-Recognition using ResNet-50

Senan Jadeed

Ironhack Data Analytics BootCamp (Jan-Mar).2020, Berlin

Content

Project Description
Questions & Hypotheses
Final Results
Repository contents
Links

Project Description

You can get this dataset from kaggle website

Arabic Letters Dataset is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from ’alef’ to ’yeh’) ten times. The images were scanned at the resolution of 300 dpi. Each block is segmented automatically using Matlab 2016a to determining the coordinates for each block. The dataset is partitioned into two sets: a training set of 13,440 characters to 480 images per class and a test set of 3,360 characters to 120 images per class. Writers of the training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of the test set are not from a single institution to ensure variability of the test set.

ResNet-50 is a convolutional neural network that is trained on more than a million images from the ImageNet database. The network is 50 layers deep and can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 224-by-224, Source.

Citation

Ahmed El-Sawy, Mohamed Loey, Hazem EL-Bakry, and their paper "Arabic Handwritten Characters Recognition using Convolutional Neural Network, WSEAS, 2017".

I had no role in constructing the original images and all the credit goes to the original authors of the dataset Ahmed El-Sawy, Mohamed Loey, Hazem EL-Bakry, and their paper "Arabic Handwritten Characters Recognition using Convolutional Neural Network, WSEAS, 2017". The original dataset has the images as an array of size (32,32,1) this dataset only converts the arrays to their respective jpg images.

The Main notebook is using a pre-trained ResNet-50 from Pytorch models and using the Fast.AI library which makes the training and validation super easy and fast.

Acknowledgement

All credit goes for the original authors of this dataset who made available a great dataset that is essential for anyone looking into Arabic character recognition and I hope to see more like it in other fields of the Arabic literature.

Questions & Hypotheses

The Arabic language has many fonts and ways of writings. The idea behind this project is to develop a model that can predict the different handwriting styles, which is highly difficult as the cursive writing styles makes it nearly undetectable to the trained human eye in some occasions. In my approach, I use the same basic steps of image processing introduced by the Fast.AI authors, and in the last stage, I fine-tune the parameters utilizing the Fast.AI library capabilities. Therefore, able to achieve an accuracy of 98.74%, which is a vast improvement from the highest registered accuracy score I am able to find in any official paper (97.3%) as shown in the final results section. After the analysis was finished and I tested the model against a small dataset that I wrote, I concluded that a better dataset with more extensive writing styles is needed in order to cover more possible scenarios where the letters are not easily unrecognizable.

Final Results

A comparison between the literature and the two models I tested is shown in the table below:

Publication's title	Authors	Accuracy achieved	Date of publication
High Accuracy Arabic Handwritten Characters Recognition Using Error Back Propagation Artificial Neural Networks	Assist. Prof. Majida Ali Abed & Assist. Prof. Dr. Hamid Ali Abed Alasad	93.61%	February 2015
Convolutional Neural Network Model for Arabic Handwritten Characters Recognition	Murtada Khalafallah Elbashir & Mohamed Elhafiz Mustafa	93.5%	November 2018
ARABIC HANDWRITTEN CHARACTER RECOGNITION BASED ON DEEP CONVOLUTIONAL NEURAL NETWORKS	Khaled S. Younis	97.6%	December 2017
Arabic-Handwritten-Characters-Recognition using CNN_ResNet-18	Senan Jadeed	97.23%	March 2020
Arabic-Handwritten-Characters-Recognition using CNN_ResNet-50	Senan Jadeed	98.74%	March 2020

The final results show that with the basic model using ResNet-18, the accuracy achieved is comparable to the highest accuracy achieved using the previously reported methods in the literature. By using ResNet-50 the accuracy achieved is much higher 98.74%. By putting the model to test by using it on my handwriting, the assumption made is that the quality of the dataset used is not high enough, so with a better dataset higher accuracies when tested against external datasets could be anticipated.

Repository contents

Data folder that contains the dataset used and the models produced.
Extra folder that contains the content used in preparing the presentation, the presentation pdf file and the dataset used in win.rar format.
ShowTest folder that contains the small dataset I created to test the models produced.
Jupyter notebooks that were used in producing the models and another notebook used in testing the models.

Links

Final Project Repository

Google Slides

For more info please feel free to contact me via E-mail.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Extras		Extras
.gitignore		.gitignore
1_Arabic-Handwritten-Recognition-CNN_resnet-18_Model_Isolated_Charts.ipynb		1_Arabic-Handwritten-Recognition-CNN_resnet-18_Model_Isolated_Charts.ipynb
2_Arabic-Handwritten-Recognition-CNN_resnet-50_Model_Isolated_Charts.ipynb		2_Arabic-Handwritten-Recognition-CNN_resnet-50_Model_Isolated_Charts.ipynb
3_Model in production.ipynb		3_Model in production.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic-Handwritten-Characters-Recognition using ResNet-50

Content

Project Description

You can get this dataset from kaggle website

Citation

Acknowledgement

Questions & Hypotheses

Final Results

Repository contents

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arabic-Handwritten-Characters-Recognition using ResNet-50

Content

Project Description

You can get this dataset from kaggle website

Citation

Acknowledgement

Questions & Hypotheses

Final Results

Repository contents

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages