Introduction to Arabic Natural Language Processing
Nizar Y. Habash Columbia University Abstract This book provides system developers and researchers in natural language processing and computational linguistics with the necessary background information for working with the Arabic language. The goal is to introduce Arabic linguistic phenomena and review the state-of-the-art in Arabic processing. The book discusses Arabic script, phonology, orthography, morphology, syntax and semantics, with a final chapter on machine translation issues. The chapter sizes correspond more or less to what is linguistically distinctive about Arabic, with morphology getting the lion's share, followed by Arabic script. No previous knowledge of Arabic is needed. This book is designed for computer scientists and linguists alike. The focus of the book is on Modern Standard Arabic; however, notes on practical issues related to Arabic dialects and languages written in the Arabic script are presented in different chapters. Table of Contents: What is "Arabic"? / Arabic Script / Arabic Phonology and Orthography / Arabic Morphology / Computational Morphology Tasks / Arabic Syntax / A Note on Arabic Semantics / A Note on Arabic and Machine Translation A Review in Machine Translation, by Khaled Shaalan: The book is concise, as required by the publisher, but the contents are impressive. It takes care of very specific details such as the hamza letter mark (which can appear above or below specific letter forms; when it appears at stem-initial positions it tends to be perceived as a diacritic) and relates it to aspects of various linguistic phenomena. Moreover, the material in text boxes that frequently appear as alerts or FAQ's are very interesting teaching material that directly draws the attention of the reader's mind. In particular, I admired the highlight such as FAQ: How true is "Arabic has no vowels"? I found that the book follows a reasonable approach such that the reader can easily pursue the logical sequence not only across one chapter but also across the entire book. This means that the material is brilliantly organized in such away it covers the necessary breadth and depth of its intended audience. In my opinion, for anyone who wants to understand Arabic natural language processing, this book is indispensable. Read More
Cited byFatma Mallek, Ngoc Tan Le, Fatiha Sadat. 2018. Automatic Machine Translation for Arabic Tweets. Intelligent Natural Language Processing: Trends and Applications, 101-119. Crossref Mohammad Al-Smadi, Omar Qawasmeh, Mahmoud Al-Ayyoub, Yaser Jararweh, Brij Gupta. (2017) Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. Journal of Computational Science. Online publication date: 1-Nov-2017. Crossref Mahmoud Al-Ayyoub, Aya Nuseir, Kholoud Alsmearat, Yaser Jararweh, Brij Gupta. (2017) Deep learning for Arabic NLP: A survey. Journal of Computational Science. Online publication date: 1-Nov-2017. Crossref Khalid Almeman. (2017) Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects. ACM Transactions on Asian and Low-Resource Language Information Processing 17:1, 1-12. Online publication date: 5-Oct-2017. Crossref Mustafa Jarrar, Nizar Habash, Faeq Alrimawi, Diyam Akra, Nasser Zalmout. (2017) Curras: an annotated corpus for the Palestinian Arabic dialect. Language Resources and Evaluation 51:3, 745-775. Online publication date: 8-Dec-2016. Crossref Mona Abdullah Al-Walaie, Muhammad Badruddin Khan. (2017) Arabic dialects classification using text mining techniques. 2017 International Conference on Computer and Applications (ICCA), 325-329. Crossref Muhammad Abdul-Mageed. (2017) Modeling Arabic subjectivity and sentiment in lexical space. Information Processing & Management. Online publication date: 1-Aug-2017. Crossref Kholoud Alsmearat, Mahmoud Al-Ayyoub, Riyad Al-Shalabi, Ghassan Kanaan. (2017) Author gender identification from Arabic text. Journal of Information Security and Applications 35, 85-95. Online publication date: 1-Aug-2017. Crossref Salima Harrat, Karima Meftouh, Kamel Smaili. (2017) Machine translation for Arabic dialects (survey). Information Processing & Management. Online publication date: 1-Aug-2017. Crossref Ahmed Masrai, James Milton. (2017) How many words do you need to speak Arabic? An Arabic vocabulary size test. The Language Learning Journal 11, 1-18. Online publication date: 3-Jan-2017. Crossref Ramy Baly, Hazem Hajj, Nizar Habash, Khaled Bashir Shaban, Wassim El-Hajj. (2017) A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic. ACM Transactions on Asian and Low-Resource Language Information Processing 16:4, 1-21. Online publication date: 13-Jul-2017. Crossref Ahmad Al-Sallab, Ramy Baly, Hazem Hajj, Khaled Bashir Shaban, Wassim El-Hajj, Gilbert Badaro. (2017) AROMA. ACM Transactions on Asian and Low-Resource Language Information Processing 16:4, 1-20. Online publication date: 13-Jul-2017. Crossref Naaima Boudad, Rdouan Faizi, Rachid Oulad Haj Thami, Raddouane Chiheb. (2017) Sentiment analysis in Arabic: A review of the literature. Ain Shams Engineering Journal. Online publication date: 1-Jul-2017. Crossref Baligh M. Al-Helali, Sabri A. Mahmoud. (2017) Arabic Online Handwriting Recognition (AOHR). ACM Computing Surveys 50:3, 1-35. Online publication date: 29-Jun-2017. Crossref Sumayh S. Aljameel, James D. O'Shea, Keeley A. Crockett, Annabel Latham, Mohammad Kaleem. (2017) Development of an Arabic Conversational Intelligent Tutoring System for Education of children with ASD. 2017 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), 24-29. Crossref Waleed Alabbas, Haider M. al-Khateeb, Ali Mansour, Gregory Epiphaniou, Ingo Frommholz. (2017) Classification of colloquial Arabic tweets in real-time to detect high-risk floods. 2017 International Conference On Social Media, Wearable And Web Analytics (Social Media), 1-8. Crossref Naila Habib Khan, Awais Adnan, Sadia Basar. (2017) Urdu ligature recognition using multi-level agglomerative hierarchical clustering. Cluster Computing 3. Online publication date: 25-May-2017. Crossref Mohammad AL-Smadi, Zain Jaradat, Mahmoud AL-Ayyoub, Yaser Jararweh. (2017) Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features. Information Processing & Management 53:3, 640-652. Online publication date: 1-May-2017. Crossref Mahmoud Al-Ayyoub, Ahmed Alwajeeh, Ismail Hmeidi. (2017) An extensive study of authorship authentication of Arabic articles. International Journal of Web Information Systems 13:1, 85-104. Online publication date: 18-Apr-2017. Crossref Inès Zribi, Mariem Ellouze, Lamia Hadrich Belguith, Philippe Blache. (2017) Morphological disambiguation of Tunisian dialect. Journal of King Saud University - Computer and Information Sciences 29:2, 147-155. Online publication date: 1-Apr-2017. Crossref Amine Chennoufi, Azzeddine Mazroui. (2017) Morphological, syntactic and diacritics rules for automatic diacritization of Arabic sentences. Journal of King Saud University - Computer and Information Sciences 29:2, 156-163. Online publication date: 1-Apr-2017. Crossref Mahmoud Al-Ayyoub, Abdullateef Rabab’ah, Yaser Jararweh, Mohammed N. Al-Kabi, Brij B. Gupta. (2017) Studying the controversy in online crowds’ interactions. Applied Soft Computing. Online publication date: 1-Mar-2017. Crossref Nora Al-Twairesh, Hend Al-Khalifa, AbdulMalik Al-Salman, Yousef Al-Ohali. (2017) AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets. Procedia Computer Science 117, 63-72. Online publication date: 1-Jan-2017. Crossref Osama Hamed, Torsten Zesch. (2017) The Role of Diacritics in Designing Lexical Recognition Tests for Arabic. Procedia Computer Science 117, 119-128. Online publication date: 1-Jan-2017. Crossref Nizar Ghazzawi, Benoît Robichaud, Patrick Drouin, Fatiha Sadat. (2017) Automatic extraction of specialized verbal units. Terminology 23:2. Crossref Santosh K. Ray, Khaled Shaalan. (2016) A Review and Future Perspectives of Arabic Question Answering Systems. IEEE Transactions on Knowledge and Data Engineering 28:12, 3169-3190. Online publication date: 1-Dec-2016. Crossref Mohammed Korayem, Khalifeh Aljadda, David Crandall. (2016) Sentiment/subjectivity analysis survey for languages other than English. Social Network Analysis and Mining 6:1. Online publication date: 9-Sep-2016. Crossref Sumayh S. Aljameel, James D. O'Shea, Keeley A. Crockett, Annabel Latham. (2016) Survey of string similarity approaches and the challenging faced by the Arabic language. 2016 11th International Conference on Computer Engineering & Systems (ICCES), 241-247. Crossref Muhammad Helmy, Dario De Nart, Dante Degl'Innocenti, Carlo Tasso. (2016) Leveraging Arabic morphology and syntax for achieving better keyphrase extraction. 2016 International Conference on Asian Language Processing (IALP), 340-343. Crossref Wafia Adouane, Nasredine Semmar, Richard Johansson. (2016) Romanized Arabic and Berber detection using prediction by partial matching and dictionary methods. 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), 1-7. Crossref Fahd Alqasemi, Amira Abdelwahab, Hatem Abdelkader. (2016) An enhanced feature extraction technique for improving sentiment analysis in Arabic language. 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt), 380-384. Crossref , Mohammed Abdulmalik Ali. (2016) Artificial intelligence and natural language processing: the Arabic corpora in online translation software. International Journal of ADVANCED AND APPLIED SCIENCES 3:9, 59-66. Online publication date: 1-Sep-2016. Crossref Baligh M. Al-Helali, Sabri A. Mahmoud. (2016) A Statistical Framework for Online Arabic Character Recognition. Cybernetics and Systems 47:6, 478-498. Online publication date: 5-Aug-2016. Crossref Nabil Khoufi, Chafik Aloulou, Lamia Hadrich Belguith. (2016) Parsing Arabic using induced probabilistic context free grammar. International Journal of Speech Technology 19:2, 313-323. Online publication date: 4-Sep-2015. Crossref Nabil Khoufi, Chafik Aloulou, Lamia Hadrich Belguith. (2016) Toward hybrid method for parsing Modern Standard Arabic. 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 451-456. Crossref Omar Al-Qawasmeh, Mohammad Al-Smadi, Nisreen Fraihat. (2016) Arabic named entity disambiguation using linked open data. 2016 7th International Conference on Information and Communication Systems (ICICS), 333-338. Crossref Bilel Elayeb, Ibrahim Bounhas. (2016) Arabic Cross-Language Information Retrieval. ACM Transactions on Asian and Low-Resource Language Information Processing 15:3, 1-44. Online publication date: 28-Jan-2016. Crossref Ahmed M. Masrai. (2016) The influence of morphological knowledge on lexical processing and acquisition: The case of Arab EFL learners. Ampersand 3, 52-60. Online publication date: 1-Jan-2016. Crossref Esraa Khalid Alobaydi, Norlia Mustaffa, Rayan Yousif Alkhayat, Muhammad Rafie Hj. Mohd. Arshad. (2016) U-Arabic: Design perspective of context-aware ubiquitous Arabic vocabularies learning system. 2016 6th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 1-6. Crossref Islam Obaidat, Rami Mohawesh, Mahmoud Al-Ayyoub, Mohammad AL-Smadi, Yaser Jararweh. (2015) Enhancing the determination of aspect categories and their polarities in Arabic reviews using lexicon-based approaches. 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), 1-6. Crossref Essma Selab, Ahmed Guessoum. (2015) A statistical approach for the induction of a grammar of Arabic. 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 1-8. Crossref Atefeh Farzindar, Diana Inkpen. (2015) Natural Language Processing for Social Media. Synthesis Lectures on Human Language Technologies 8:2, 1-166. Online publication date: 28-Aug-2015. Abstract | PDF (2886 KB) | PDF Plus (2279 KB) L. Gwilliams, A. Marantz. (2015) Non-linear processing of a linear speech stream: The influence of morphological structure on the recognition of spoken Arabic words. Brain and Language 147, 1-13. Online publication date: 1-Aug-2015. Crossref Mohamed Faouzi BenZeghiba, Jerome Louradour, Christopher Kermorvant. (2015) Hybrid word/Part-of-Arabic-Word Language Models for arabic text document recognition. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 671-675. Crossref Hossam S. Ibrahim, Sherif M. Abdou, Mervat Gheith. (2015) MIKA: A tagged corpus for modern standard Arabic and colloquial sentiment analysis. 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), 353-358. Crossref Nora Al-Twairesh, Hend Al-Khalifa, AbdulMalik Al-Salman. (2015) Towards Analyzing Saudi Tweets. 2015 First International Conference on Arabic Computational Linguistics (ACLing), 114-117. Crossref Chaohai Ding, Nawar Halabi, Lama Al-Zaben, Yunjia Li, E. A. Draffan, Mike Wald. (2015) A web based multi-linguists symbol-to-text AAC application. Proceedings of the 12th Web for All Conference on - W4A '15, 1-2. Crossref Mohamed A. Meselhi, Hitham M. Abo Bakr, Ibrahim Ziedan, Khaled Shaalan. (2014) Hybrid Named Entity Recognition - Application to Arabic Language. 2014 9th International Conference on Computer Engineering & Systems (ICCES), 80-85. Crossref Nora Al-Twairesh, Hend Al-Khalifa, AbdulMalik Al-Salman. (2014) Subjectivity and sentiment analysis of Arabic: Trends and challenges. 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), 148-155. Crossref Jennifer Sikos, Peter David, Nizar Habash, Reem Faraj. (2014) Authorship Analysis of Inspire Magazine through Stylometric and Psychological Features. 2014 IEEE Joint Intelligence and Security Informatics Conference, 33-40. Crossref Nawaf Abdulla, Roa'a Majdalawi, Salwa Mohammed, Mahmoud Al-Ayyoub, Mohammed Al-Kabi. (2014) Automatic Lexicon Construction for Arabic Sentiment Analysis. 2014 International Conference on Future Internet of Things and Cloud, 547-552. Crossref Wael Hassan Gomaa, Aly Aly Fahmy. (2014) Automatic scoring for answers to Arabic test questions. Computer Speech & Language 28:4, 833-857. Online publication date: 1-Jul-2014. Crossref Mohammad Hijjawi, Zuhair Bandar, Keeley Crockett, David Mclean. (2014) ArabChat: An Arabic Conversational Agent. 2014 6th International Conference on Computer Science and Information Technology (CSIT), 227-237. Crossref M. Sawalha, E. Atwell, M. A. M. Abushariah. (2013) SALMA: Standard Arabic Language Morphological Analysis. 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), 1-6. Crossref Harith Al-Jumaily, Paloma Martínez, José L. Martínez-Fernández, Erik Van der Goot. (2012) A real time Named Entity Recognition system for Arabic text mining. Language Resources and Evaluation 46:4, 543-563. Online publication date: 1-May-2011. Crossref Mohammad A. M. Abushariah, Raja N. Ainon, Roziati Zainuddin, Moustafa Elshafei, Othman O. Khalifa. (2012) Phonetically rich and balanced text and speech corpora for Arabic language. Language Resources and Evaluation 46:4, 601-634. Online publication date: 5-Nov-2011. Crossref Abdelkarim Erradi, Sajeda Nahia, Hind Almerekhi, Lubna Al-kailani. (2012) LingoSnacks: m-Learning platform for language learning. 2012 Colloquium in Information Science and Technology, 149-154. Crossref Abdelkarim Erradi, Sajeda Nahia, Hind Almerekhi, Lubna Al-kailani. (2012) ArabicTutor: A multimedia m-Learning platform for learning Arabic spelling and vocabulary. 2012 International Conference on Multimedia Computing and Systems, 833-838. Crossref Afnan A. Al-Subaihin, Hend S. Al-Khalifa. (2011) Al-Baseet: A proposed simplification authoring tool for the Arabic language. 2011 International Conference on Communications and Information Technology (ICCIT), 121-125. Crossref Hussein Soori, Jan Platos, Vaclav Snasel, Hussam Abdulla. 2011. Simple Rules for Syllabification of Arabic Texts. Digital Information Processing and Communications, 97-105. Crossref Ahmed Abbache, Fatiha Barigou, Fatma Zohra Belkredim, Ghalem Belalem. The Use of Arabic WordNet in Arabic Information Retrieval. Business Intelligence: Concepts, Methodologies, Tools, and Applications, 773-783. Crossref Fathi Fawi. Costituzione di un corpus giuridico parallelo italiano-arabo. Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015, 125-129. Crossref
|
|
|