Workshop Description

Given the success of the first, second, third, and fourth workshops on Open-Source Arabic Corpora and Corpora Processing Tools (OSACT) in LREC 2014, LREC 2016, LREC 2018 and LREC 2020, the fifth workshop comes to encourage researchers and practitioners of Arabic language technologies, including computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) to share and discuss their latest research efforts, corpora, and tools. The workshop will also give special attention on Multilingualism and Language Technology for All, which is one of LREC 2022 hot topics. In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on two shared tasks, namely: Quran QA and Fine-Grained Hate Speech Detection.

Shared Task

Quran QA Shared Task


For more information about Q&A for Quran shared task please visit this website

Important dates:

  • 6 February 2022: Train/dev set release
  • 26-29 March 2022: Runs submission (Test set available)
  • 31 March 2022: Announcing runs results
  • 10 April 2022: Shared-task paper submission deadline
  • 1 May 2022: Notification of acceptance
  • 25 May 2022: Camera ready submission of manuscripts

  • Fine-grained detection of hate speech on Arabic Twitter Shared Task


    For more information about Fine-grained detection of hate speech on Arabic Twitter shared task please visit this website

    Important dates:

  • 6 February 2022: Train/dev set release
  • 26-29 March 2022: Runs submission (Test set available)
  • 31 March 2022: Announcing runs results
  • 10 April 2022: Shared-task paper submission deadline
  • 1 May 2022: Notification of acceptance
  • 25 May 2022: Camera ready submission of manuscripts
  • Motivation and Topics of interest

    In the NLP, CL, and IR communities, Arabic is considered to be relatively resource-poor compared to English. This situation was thought to be the reason for the limited number of corpus-based studies in Arabic. However, the past years witnessed the emergence of new considerably free Modern Standard Arabic (MSA) corpora and to a lesser extent Arabic processing tools.

    This workshop follows the footsteps of previous editions of OSACT to provide a forum for researchers to share and discuss their ongoing work. This workshop is timely given the continued rise in research projects focusing on Arabic Language Resources.

    Language Resources:

  • Pre-trained Arabic language models and their applications.
  • Surveying and evaluating the design of available Arabic corpora, their associated and processing tools.
  • Availing new annotated corpora for NLP and IR applications such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning.
  • Evaluating the use of crowdsourcing platforms for Arabic data annotation.
  • Open source Arabic processing toolkits.

  • Tools and Technologies:
  • Language education, e.g., L1 and L2.
  • Language modeling and pre-trained models.
  • Tokenization, normalization, word segmentation, morphological analysis, part-of-speech tagging, etc.
  • Sentiment analysis, dialect identification, and text classification
  • Dialect translation
  • Fake news detection
  • Web and social media search and analytics
  • Issues in the design, construction and use of Arabic LRs: text, speech, sign, gesture, image, in single or multimodal/multimedia data
  • Guidelines, standards, best practices and models for LRs interoperability
  • Methodologies and tools for LRs construction and annotation
  • Methodologies and tools for extraction and acquisition of knowledge
  • Ontologies, terminology and knowledge representation
  • LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.)

  • Issues in the design, construction and use of Arabic LRs: text, speech, sign, gesture, image, in single or multimodal/multimedia data:
  • Guidelines, standards, best practices and models for LRs interoperability
  • Methodologies and tools for LRs construction and annotation
  • Methodologies and tools for extraction and acquisition of knowledge
  • Ontologies, terminology and knowledge representation
  • LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.)
  • Important Dates

    Submission deadline: April 10, 2022
    Notification of acceptance: May 1, 2022
    Camera Ready of manuscripts: May 25, 2022
    Workshop date: June 20, 2022

    Submission guidelines

    The language of the workshop is English and submissions should be with respect to LREC 2022 paper submission instructions (https://lrec2022.lrec-conf.org/en/submission2020/authors-kit/). All papers will be peer reviewed, possibly by three independent referees. Papers must be submitted electronically in PDF format to the STAR system.

    When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research.

    Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

    Your submission must:

  • consist of 4 to 8 pages, references and appendix excluded (the number of appendix pages should remain reasonable)
  • The submissions are NOT anonymous.
  • comply strictly to the LREC stylesheet
  • be formatted as a PDF.
  • Please check out the FAQ before asking a question.

    Submission link: START Page

    Committees

    Organizing Committee

    • Hend Al-Khalifa, King Saud University, KSA
    • Walid Magdy, University of Edinburgh, UK
    • Kareem Darwish, aiXplain Inc., US
    • Tamer Elsayed, Qatar University, Qatar
    • Hamdy Mubarak, Qatar Computing Research Institute, Qatar
    • Abdulmohsen Al-Thubaity, KACST, KSA

    Programme Committee

    • Abdelmajid Ben-Hamadou, Sfax University, Tunisia
    • AbdelRahim Elmadany, The University of British Columbia, Canada
    • Abdullah Alrajeh, King Abdulzziz City for Science and Technology, KSA
    • Abdulrahman Almuhareb, King Abdulzziz City for Science and Technology, KSA
    • Adel Alshehri, King Abdulzziz City for Science and Technology, KSA
    • Alexis Nasr, University of Marseille, France
    • Aloulou Chafik, Univeristé de Sfax, Tunisia
    • Areeb Alowisheq, Saudi Data and Artificial Intelligence Authority, KSA
    • Azzeddine Mazroui, University Mohamed I, Morocco
    • Bassam Haddad, University of Petra, Jordan
    • El Moatez Billah Nagoudi, The University of British Columbia, Canada
    • Fatima Haouari, Qatar University, Qatar
    • Fethi Bougares, Le Mans University, France
    • Fouzi Harrag, Ferhat Abbas University, Algeria
    • Hamada Nayel, Benha University, Egypt
    • Ibrahim Abu Farha, University of Edinburgh, Scotland
    • Imed Zitouni, Google, USA
    • Karim Bouzoubaa, Mohammad V University, Morocco
    • Khaled Shaalan, The British University in Dubai, UAE
    • Maram Hasanain, Qatar University, Qatar
    • Mourad Abbas, HCLA, Algeria
    • Mucahid Kutlu, TOBB University, Turkey
    • Muhammad Abdul-Mageed, The university of British Columbia, Canada
    • Mustafa Jarrar, Bir Zeit University, Palestine
    • Nada Ghneim, Higher Institute for Applied Sciences and Technology, Syria
    • Nizar Habash, New York University Abu Dhabi, UAE
    • Nora Al-Twairesh, King Saud University, KSA
    • Omar Trigui, University of Sousse, Tunisia
    • Reem Suwaileh, Qatar University, Qatar
    • Sahar Ghannay, LIMSI, France
    • Sakhar Alkhereyf, King Abdulzziz City for Science and Technology, KSA
    • Salam Khalifa, New York University Abu Dhabi, UAE
    • Salima Harrat, École Normale Supérieure (Bouzaréah), Algeria
    • salima mdhaffar, Le Mans University, France
    • Samhaa R. El-Beltagy, Newgiza University, Egypt
    • Saud Alashri, King Abdulzziz City for Science and Technology, KSA
    • Shammur Absar Chowdhury, Qatar Computing Research Institute, Qatar
    • Wajdi Zaghouani, Hamad Bin Khalifa University, Qatar
    • Waleed Alsanie, King Abdulzziz City for Science and Technology, KSA
    • Watheq Mansour, Qatar University, Qatar
    • Wissam Antoun, American University of Beirut, Lebanon
    • Younes Samih, Heinrich Heine Universität Düsseldorf, Germany

    Keynote Speaker

    Speaker: Hassan Sawaf, AIXplain.com
    Title: A proposal to accelerate innovation for Arabic Speech and Language Processing

    Abstract:
    The innovation rate for Arabic Speech and Language Processing in the last 20 years has steadily been increasing, leading to technologies achieving great results. Yet, there are challenges that need to be addressed to improve the speed even further. This starts with resources (across data, tools, and people), alignment over tasks (task definition, metrics and benchmarks), and broad engagement of diverse stakeholders (academia, industry and policy makers). Hassan will give a brief reflection over his past work on Arabic Speech and Language Processing and will suggest ways for the community to engage with each other to push innovation much further.

    Accepted Papers (Main Workshop)


    1. TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation by El Moatez Billah Nagoudi, AbdelRahim Elmadany and Muhammad Abdul-Mageed

    2. Detecting Users Prone to Spread Fake News on Arabic Twitter by Zien Sheikh Ali, Abdulaziz Al-Ali and Tamer Elsayed

    3. AraSAS: The Open Source Arabic Semantic Tagger by Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson and Nizar Habash

    4. AraNPCC: The Arabic Newspaper COVID-19 Corpus by Abdulmohsen Al-Thubaity, Sakhar Alkhereyf and Alia O. Bahanshal

    5. Pre-trained Models or Feature Engineering: The Case of Dialectal Arabic by Kathrein Abu Kwaik, Stergios Chatzikyriakidis and Simon Dobnik

    6. A Context-free Arabic Emoji Sentiment Lexicon (CF-Arab-ESL) by Shatha Ali A. Hakami, Robert Hendley and Phillip Smith

    7. Sa`7r: A Saudi Dialect Irony Dataset by Halah AlMazrua, Najla AlHazzani, Amaal AlDawod, Lama AlAwlaqi, Noura AlReshoudi, Hend Al-Khalifa and Luluh AlDhubayi

    8. Classifying Arabic Crisis Tweets using Data Selection and Pre-trained Language Models by Alaa Alharbi and Mark Lee

    Workshop Program

    OSACT 5 (The 5th Workshop on Open-Source Arabic Corpora and Processing Tools)
    Session 1 Chair Authors Affliation of the 1st author Talk
    9:00-9:10 Hamdy Mubarak (in-person) & Abdulmohsen Al-Thubaity (Remotly) Workshop Opening Welcome and Introduction by Workshop Chairs
    9:10-9:50 Hassan Sawaf aiXplain Inc., US Keynote (A proposal to accelerate innovation for Arabic Speech and Language Processing)
    9:50-10:10 El Moatez Billah Nagoudi, AbdelRahim Elmadany and Muhammad Abdul-Mageed University of British Columbia (UBC), Canada TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation
    10:10-10:30 Zien Sheikh Ali, Abdulaziz Al-Ali and Tamer Elsayed Qatar University, Qatar Detecting Users Prone to Spread Fake News on Arabic Twitter
    Session 2
    11:00-11:20 Hamdy Mubarak (in-person) & Hend Al-Khalifa (Remotly) Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson and Nizar Habash Lancaster University, UK AraSAS: The Open Source Arabic Semantic Tagger
    11:20-11:40 Abdulmohsen Al-Thubaity, Sakhar Alkhereyf and Alia O. Bahanshal King Abdulaziz City for Science and Technology (KACST), Saudi Arabia AraNPCC: The Arabic Newspaper COVID-19 Corpus
    11:40-12:00 Kathrein Abu Kwaik, Stergios Chatzikyriakidis and Simon Dobnik Gothenburg University, Sweden Pre-trained Models or Feature Engineering: The Case of Dialectal Arabic
    12:00-12:20 Shatha Ali A. Hakami, Robert Hendley and Phillip Smith University of Birmingham , UK A Context-free Arabic Emoji Sentiment Lexicon (CF-Arab-ESL)
    12:20-12:40 Halah AlMazrua, Najla AlHazzani, Amaal AlDawod, Lama AlAwlaqi, Noura AlReshoudi, Hend Al-Khalifa and Luluh AlDhubayi King Saud University, Saudi Arabia Sa`7r: A Saudi Dialect Irony Dataset
    12:40-13:00 Alaa Alharbi and Mark Lee University of Birmingham , UK Classifying Arabic Crisis Tweets using Data Selection and Pre-trained Language Models
    Session 3
    14:00-14:20 Hamdy Mubarak (in-person) &a Tamer Elsayed (Remotely) Tamer Elsayed Qatar University, Qatar Qur'an QA 2022: Task Overview
    14:20-14:30 Damith Dola Mullage Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani and Ruslan Mitkov University of Wolverhampton, UK DTW at Qur’an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain
    14:30-14:40 Esha Aftab and Muhammad Kamran Malik Punjab University, Pakistan eRock at Qur’an QA 2022: Contemporary Deep Neural Networks for Qur’an based Reading Comprehension Question Answers
    14:40-14:50 Ali Mostafa and Omar Mohamed Helwan University, Egypt GOF at Qur'an QA 2022: Towards an Efficient Question Answering For The Holy Qu'ran In The Arabic Language Using Deep Learning-Based Approach
    14:50-15:00 Youssef MELLAH, Ibtissam Touahri, Zakaria Kaddari, Zakaria Haja, Jamal Berrich and Toumi Bouchentouf Superior School of Technology, Morocco LARSA22 at Qur’an QA 2022: Text-to-Text Transformer for Finding Answers to Questions from Qur’an
    15:00-15:10 Abdullah Alsaleh, Saud Althabiti, Ibtisam K. Alshammari, Sarah Alnefaie, Sanaa Alowaidi, Alaa Fahad Alsaqer, Eric Atwell, Abdulrahman Altahhan and Mohammad Ammar Alsalka University of Leeds, UK LK2022 at Qur'an QA 2022: Simple Transformers Model for Finding Answers to Questions from Qur'an
    15:10-15:20 Nikhil Singh - niksss at Qur'an QA 2022: A Heavily Optimized BERT Based Model for Answering Questions from the Holy Qu'ran
    15:20-15:30 Basem H.A. Ahmed, Motaz Saad and Eshrag A. Refaee Alaqsa University, Palestine QQATeam at Qur’an QA 2022: Fine-Tunning Arabic QA Models for Qur’an QA Task
    15:30-15:40 Amr Keleg and Walid Magdy University of Edinburgh, UK SMASH at Qur’an QA 2022: Creating Better Faithful Data Splits for Low-resourced Question Answering Scenarios
    15:40-15:50 Ahmed Wasfey Sleem, Eman Mohammed lotfy Elrefai, Marwa Mohammed Matar and Haq Nawaz Tactful AI, Egypt Stars at Qur'an QA 2022: Building Automatic Extractive Question Answering Systems for the Holy Qur'an with Transformer Models and Releasing a New Dataset
    15:50-16:00 Mohamemd Alaa Elkomy and Amany M. Sarhan Tanta University, Egypt TCE at Qur'an QA 2022: Arabic Language Question Answering Over Holy Qur'an Using a Post-Processed Ensemble of BERT-based Models
    Session 4
    16:30-16:40 Hamdy Mubarak (in-person) & Hend Al-Khalifa (Remotly) Hamdy Mubarak Qatar Computing Research Institute, Qatar Fine-Grained Hate Speech Detection Shared Task overview
    16:40-16:50 Ali Mostafa, Omar Mohamed and Ali Ashraf Helwan University, Egypt GOF at Arabic Hate Speech 2022: Breaking The Loss Function Convention For Data-Imbalanced Arabic Offensive Text Detection
    16:50-17:00 Mohamed Aziz Bennessir, Malek Rhouma, Hatem Haddad and Chayma Fourati iCompass, Tunisia iCompass at Arabic Hate Speech 2022: Detect Hate Speech Using QRNN and Transformers
    17:00-17:10 Angel Felipe Magnossão de Paula, Paolo Rosso, Imene Bensalem and Wajdi Zaghouani Universidad Politécnica de València, Spain UPV at the Arabic Hate Speech 2022 Shared Task: Offensive Language and Hate Speech Detection using Transformers and Ensemble Models
    17:10-17:20 Badr AlKhamissi and Mona Diab Meta, US Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification
    17:20-17:30 Kirollos Makram, Kirollos George Nessim, Malak Emad Abd-Almalak, Shady Zekry Roshdy, Seif Hesham Salem, Fady Fayek Thabet and Ensaf Hussien Mohamed Helwan University, Egypt CHILLAX - at Arabic Hate Speech 2022: A Hybrid Machine Learning and Trans- formers based Model to Detect Arabic Offensive and Hate Speech"
    17:30-17:40 Ahmad Shapiro, Ayman Khalafallah and Marwan Torki Alexandria University, Egypt AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify
    17:40-17:50 Nehal Elkaref and Mervat Abu-Elkheir German University in Cairo, Egypt GUCT at Arabic Hate Speech 2022: Towards a Better Isotropy for Hatespeech Detection
    17:50-18:00 Salaheddin Alzubi, Thiago Castro Ferreira, Lucas Pavanelli and Mohamed Al-Badrashiny aiXplain Inc., US aiXplain at Arabic Hate Speech 2022: An Ensemble Based Approach to Detecting Offensive Tweets
    18:00-18:05 Closing + Best paper award

    OSACT Awards


    OSACT2022 Best Paper Award is awarded to:
    (TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation) by El Moatez Billah Nagoudi, AbdelRahim Elmadany and Muhammad Abdul-Mageed

    Quran QA shared task Awards:

    1. 1st place (500$): TCE at Qur’an QA 2022: Arabic Language Question Answering Over Holy Qur’an Using a Post-Processed Ensemble of BERT-based Models By Mohammed ElKomy and Amany M. Sarhan
    2. 2nd place (350$): QQATeam at Qur’an QA 2022: Fine-Tunning Arabic QA Models for Qur’an QA Task By Basem H. Ahmed, Motaz K. Saad and Eshrag A. Refaee
    3. 3rd Place (250$): GOF at Qur’an QA 2022: Towards an Efficient Question Answering For The Holy Qu’ran In The Arabic Language Using Deep Learning-Based Approach By Aly Mostafa and Omar Mohamed
    4. Best paper (150$): (DTW at Qur’an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain) By Damith Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani and Ruslan Mitkov

    Fine-Grained Hate Speech Detection Shared Task winners are:

    1. Sub-Task A: GOF at Arabic Hate Speech 2022: Breaking The Loss Function Convention For Data-Imbalanced Arabic Offensive Text Detection By Aly Mostafa, Omar Mohamed and Ali Ashraf
    2. Sub-Task B & C: iCompass at Arabic Hate Speech 2022: Detect Hate Speech Using QRNN and Transformers By Mohamed Aziz Ben Nessir, Malek Rhouma, Hatem Haddad and Chayma Fourati
    3. Best Paper Multitask learning with self-correction for hate speech classification by Badr AlKhamissi and Mona Diab
    Congratulations to all winning teams 👏🎉