The 5th Workshop on Open-Source Arabic Corpora and Processing Tools
with Shared Tasks on Quran QA and Fine-Grained Hate Speech Detection
Marseille, France. 20th June 2022. Co-located with LREC 2022
The OSACT 2022 Proceedings are available at http://www.lrec-conf.org/proceedings/lrec2022/workshops/OSACT/index.html
Given the success of the first, second, third, and fourth workshops on Open-Source Arabic Corpora and Corpora Processing Tools (OSACT) in LREC 2014, LREC 2016, LREC 2018 and LREC 2020, the fifth workshop comes to encourage researchers and practitioners of Arabic language technologies, including computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) to share and discuss their latest research efforts, corpora, and tools. The workshop will also give special attention on Multilingualism and Language Technology for All, which is one of LREC 2022 hot topics. In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on two shared tasks, namely: Quran QA and Fine-Grained Hate Speech Detection.
In the NLP, CL, and IR communities, Arabic is considered to be relatively resource-poor compared to English. This situation was thought to be the reason for the limited number of corpus-based studies in Arabic. However, the past years witnessed the emergence of new considerably free Modern Standard Arabic (MSA) corpora and to a lesser extent Arabic processing tools.
This workshop follows the footsteps of previous editions of OSACT to provide a forum for researchers to share and discuss their ongoing work. This workshop is timely given the continued rise in research projects focusing on Arabic Language Resources.
Language Resources:
Tools and Technologies:
Issues in the design, construction and use of Arabic LRs: text, speech, sign, gesture, image, in single or multimodal/multimedia data:
Submission deadline: April 10, 2022
Notification of acceptance: May 1, 2022
Camera Ready of manuscripts: May 25, 2022
Workshop date: June 20, 2022
The language of the workshop is English and submissions should be with respect to LREC 2022 paper submission instructions (https://lrec2022.lrec-conf.org/en/submission2020/authors-kit/). All papers will be peer reviewed, possibly by three independent referees. Papers must be submitted electronically in PDF format to the STAR system.
When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research.
Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).
Your submission must:
Submission link: START Page
Speaker: Hassan Sawaf, AIXplain.com
Title: A proposal to accelerate innovation for Arabic Speech and Language Processing
Abstract:
The innovation rate for Arabic Speech and Language Processing in the last 20 years has steadily been increasing, leading to technologies achieving great results. Yet, there are challenges that need to be addressed to improve the speed even further. This starts with resources (across data, tools, and people), alignment over tasks (task definition, metrics and benchmarks), and broad engagement of diverse stakeholders (academia, industry and policy makers). Hassan will give a brief reflection over his past work on Arabic Speech and Language Processing and will suggest ways for the community to engage with each other to push innovation much further.
1. TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation by El Moatez Billah Nagoudi, AbdelRahim Elmadany and Muhammad Abdul-Mageed
2. Detecting Users Prone to Spread Fake News on Arabic Twitter by Zien Sheikh Ali, Abdulaziz Al-Ali and Tamer Elsayed
3. AraSAS: The Open Source Arabic Semantic Tagger by Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson and Nizar Habash
4. AraNPCC: The Arabic Newspaper COVID-19 Corpus by Abdulmohsen Al-Thubaity, Sakhar Alkhereyf and Alia O. Bahanshal
5. Pre-trained Models or Feature Engineering: The Case of Dialectal Arabic by Kathrein Abu Kwaik, Stergios Chatzikyriakidis and Simon Dobnik
6. A Context-free Arabic Emoji Sentiment Lexicon (CF-Arab-ESL) by Shatha Ali A. Hakami, Robert Hendley and Phillip Smith
7. Sa`7r: A Saudi Dialect Irony Dataset by Halah AlMazrua, Najla AlHazzani, Amaal AlDawod, Lama AlAwlaqi, Noura AlReshoudi, Hend Al-Khalifa and Luluh AlDhubayi
8. Classifying Arabic Crisis Tweets using Data Selection and Pre-trained Language Models by Alaa Alharbi and Mark Lee
OSACT 5 (The 5th Workshop on Open-Source Arabic Corpora and Processing Tools) | ||||||
Session 1 | Chair | Authors | Affliation of the 1st author | Talk | ||
9:00-9:10 | Hamdy Mubarak (in-person) & Abdulmohsen Al-Thubaity (Remotly) | Workshop Opening | Welcome and Introduction by Workshop Chairs | |||
9:10-9:50 | Hassan Sawaf | aiXplain Inc., US | Keynote (A proposal to accelerate innovation for Arabic Speech and Language Processing) | |||
9:50-10:10 | El Moatez Billah Nagoudi, AbdelRahim Elmadany and Muhammad Abdul-Mageed | University of British Columbia (UBC), Canada | TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation | |||
10:10-10:30 | Zien Sheikh Ali, Abdulaziz Al-Ali and Tamer Elsayed | Qatar University, Qatar | Detecting Users Prone to Spread Fake News on Arabic Twitter | |||
Session 2 | ||||||
11:00-11:20 | Hamdy Mubarak (in-person) & Hend Al-Khalifa (Remotly) | Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson and Nizar Habash | Lancaster University, UK | AraSAS: The Open Source Arabic Semantic Tagger | ||
11:20-11:40 | Abdulmohsen Al-Thubaity, Sakhar Alkhereyf and Alia O. Bahanshal | King Abdulaziz City for Science and Technology (KACST), Saudi Arabia | AraNPCC: The Arabic Newspaper COVID-19 Corpus | |||
11:40-12:00 | Kathrein Abu Kwaik, Stergios Chatzikyriakidis and Simon Dobnik | Gothenburg University, Sweden | Pre-trained Models or Feature Engineering: The Case of Dialectal Arabic | |||
12:00-12:20 | Shatha Ali A. Hakami, Robert Hendley and Phillip Smith | University of Birmingham , UK | A Context-free Arabic Emoji Sentiment Lexicon (CF-Arab-ESL) | |||
12:20-12:40 | Halah AlMazrua, Najla AlHazzani, Amaal AlDawod, Lama AlAwlaqi, Noura AlReshoudi, Hend Al-Khalifa and Luluh AlDhubayi | King Saud University, Saudi Arabia | Sa`7r: A Saudi Dialect Irony Dataset | |||
12:40-13:00 | Alaa Alharbi and Mark Lee | University of Birmingham , UK | Classifying Arabic Crisis Tweets using Data Selection and Pre-trained Language Models | |||
Session 3 | ||||||
14:00-14:20 | Hamdy Mubarak (in-person) &a Tamer Elsayed (Remotely) | Tamer Elsayed | Qatar University, Qatar | Qur'an QA 2022: Task Overview | ||
14:20-14:30 | Damith Dola Mullage Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani and Ruslan Mitkov | University of Wolverhampton, UK | DTW at Qur’an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain | |||
14:30-14:40 | Esha Aftab and Muhammad Kamran Malik | Punjab University, Pakistan | eRock at Qur’an QA 2022: Contemporary Deep Neural Networks for Qur’an based Reading Comprehension Question Answers | |||
14:40-14:50 | Ali Mostafa and Omar Mohamed | Helwan University, Egypt | GOF at Qur'an QA 2022: Towards an Efficient Question Answering For The Holy Qu'ran In The Arabic Language Using Deep Learning-Based Approach | |||
14:50-15:00 | Youssef MELLAH, Ibtissam Touahri, Zakaria Kaddari, Zakaria Haja, Jamal Berrich and Toumi Bouchentouf | Superior School of Technology, Morocco | LARSA22 at Qur’an QA 2022: Text-to-Text Transformer for Finding Answers to Questions from Qur’an | |||
15:00-15:10 | Abdullah Alsaleh, Saud Althabiti, Ibtisam K. Alshammari, Sarah Alnefaie, Sanaa Alowaidi, Alaa Fahad Alsaqer, Eric Atwell, Abdulrahman Altahhan and Mohammad Ammar Alsalka | University of Leeds, UK | LK2022 at Qur'an QA 2022: Simple Transformers Model for Finding Answers to Questions from Qur'an | |||
15:10-15:20 | Nikhil Singh | - | niksss at Qur'an QA 2022: A Heavily Optimized BERT Based Model for Answering Questions from the Holy Qu'ran | |||
15:20-15:30 | Basem H.A. Ahmed, Motaz Saad and Eshrag A. Refaee | Alaqsa University, Palestine | QQATeam at Qur’an QA 2022: Fine-Tunning Arabic QA Models for Qur’an QA Task | |||
15:30-15:40 | Amr Keleg and Walid Magdy | University of Edinburgh, UK | SMASH at Qur’an QA 2022: Creating Better Faithful Data Splits for Low-resourced Question Answering Scenarios | |||
15:40-15:50 | Ahmed Wasfey Sleem, Eman Mohammed lotfy Elrefai, Marwa Mohammed Matar and Haq Nawaz | Tactful AI, Egypt | Stars at Qur'an QA 2022: Building Automatic Extractive Question Answering Systems for the Holy Qur'an with Transformer Models and Releasing a New Dataset | |||
15:50-16:00 | Mohamemd Alaa Elkomy and Amany M. Sarhan | Tanta University, Egypt | TCE at Qur'an QA 2022: Arabic Language Question Answering Over Holy Qur'an Using a Post-Processed Ensemble of BERT-based Models | |||
Session 4 | ||||||
16:30-16:40 | Hamdy Mubarak (in-person) & Hend Al-Khalifa (Remotly) | Hamdy Mubarak | Qatar Computing Research Institute, Qatar | Fine-Grained Hate Speech Detection Shared Task overview | ||
16:40-16:50 | Ali Mostafa, Omar Mohamed and Ali Ashraf | Helwan University, Egypt | GOF at Arabic Hate Speech 2022: Breaking The Loss Function Convention For Data-Imbalanced Arabic Offensive Text Detection | |||
16:50-17:00 | Mohamed Aziz Bennessir, Malek Rhouma, Hatem Haddad and Chayma Fourati | iCompass, Tunisia | iCompass at Arabic Hate Speech 2022: Detect Hate Speech Using QRNN and Transformers | |||
17:00-17:10 | Angel Felipe Magnossão de Paula, Paolo Rosso, Imene Bensalem and Wajdi Zaghouani | Universidad Politécnica de València, Spain | UPV at the Arabic Hate Speech 2022 Shared Task: Offensive Language and Hate Speech Detection using Transformers and Ensemble Models | |||
17:10-17:20 | Badr AlKhamissi and Mona Diab | Meta, US | Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification | |||
17:20-17:30 | Kirollos Makram, Kirollos George Nessim, Malak Emad Abd-Almalak, Shady Zekry Roshdy, Seif Hesham Salem, Fady Fayek Thabet and Ensaf Hussien Mohamed | Helwan University, Egypt | CHILLAX - at Arabic Hate Speech 2022: A Hybrid Machine Learning and Trans- formers based Model to Detect Arabic Offensive and Hate Speech" | |||
17:30-17:40 | Ahmad Shapiro, Ayman Khalafallah and Marwan Torki | Alexandria University, Egypt | AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify | |||
17:40-17:50 | Nehal Elkaref and Mervat Abu-Elkheir | German University in Cairo, Egypt | GUCT at Arabic Hate Speech 2022: Towards a Better Isotropy for Hatespeech Detection | |||
17:50-18:00 | Salaheddin Alzubi, Thiago Castro Ferreira, Lucas Pavanelli and Mohamed Al-Badrashiny | aiXplain Inc., US | aiXplain at Arabic Hate Speech 2022: An Ensemble Based Approach to Detecting Offensive Tweets | |||
18:00-18:05 | Closing + Best paper award |
OSACT2022 Best Paper Award is awarded to:
(TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation) by El Moatez Billah Nagoudi, AbdelRahim Elmadany and Muhammad Abdul-Mageed
Quran QA shared task Awards:
Fine-Grained Hate Speech Detection Shared Task winners are:
Congratulations to all winning teams 👏🎉