Language Diversity Processing Technology Development

17 November 2009 - A Best Practice Forum on Diversity in Sharm el Sheikh, Egypt


1) Importance of Linguistic Diversity
Globally, we are currently experiencing a massive and rapid loss of language and culture. In particular, the languages and cultures of communities with very few speakers have practically no chance of survival beyond the end of this century and many will disappear much sooner, perhaps within the next 10 to 20 years. The loss of these languages is largely because of linguistic and cultural assimilation with the majority group, with migration to the cities and lack of ICT support, especially in language processing technology. 
ICT has great potential to contribute effectively towards enriching the linguistic diversity. We shall work to preserve and improve linguistic diversity in Asia and around the globe in order to promote the intellectual productivity and to support innovations and sustainable development. Creation and availability of local LRs will contribute to this objective.
There are a number of ways to develop language resources for all the world languages i.e. developing local computing capacity, language maintenance and language revitalization programs. These programs include creating language resources (LR) and language processing toolkit. These activities will enable language computerization to help keep the languages alive and exist on the digital form.
In order for these programs to be developed, it is necessary to establish network and exchange programs of researchers throughout the world that would involve study, documentation and the assembly of a rich archive of materials that will help to preserve as much as possible of the language and way of life in digital and other formats. Diversity breeds diversity; the seeding of insights in the fields of science, art and literature.

2) Organization for Distributed Collaboration
Some portals should be established to catalogue language resources (LRs hereafter) in Asia and globally. There may well be multiple such portals mirroring and collaborating with each other, rather than one predominant portal. Each such portal may focus on some particular countries or regions, but their distributed collaboration should constitute a country-neutral framework for sharing metadata of LRs. National, regional, and international organizations and initiatives which may provide such portal services are as follows:
 Indonesia Language Technology Research Community (ILT-RC)
 Fostering Language Resource Network (FLaReNet)
 AFNLP (Asian Federation of Natural Language Processing)
 ALRN (Asian Language Resources Network)
 ADD (Asian Applied NLP for Linguistics Diversity and LR Development)
 APT (Asia-Pacific Telecommunity)
 ChineseLDC
 CIIL (Central Institute for Indian Languages)
 GSK (Japanese LRs association)
 Hanoi University of Technology, Faculty of Information Technology
 KORTERM (Korean Terminology Research Center for Language and Knowledge Engineering)
 MIMOS (Malaysia)
 NLP center of Myanmar
 Oriental COCOSDA
 PAN Localization project
 TBIT-BPPT (Indonesian Language Technology Center)
 University of Indonesia
 TDIL (Technology Development for Indian Languages)
 UCSC (University of Colombo School of Computing)
 Pakistan University NUCES
 A-STAR (Asian Speech Translation Advance Research Consortium)
 Linguistic Data Consortium (LDC)
 European Linguistic Data Association (ELDA)

3) Shared Technical Standards and Practices for Interoperability
Some technical standards and practices should be shared among these portals for the sake of interoperability. These standards should concern cataloguing metadata on LRs, baseline annotation, treatment of proprietary encoding/annotation, specifications of technical documents attached to LRs, methods for dealing with copyrights, and licensing practices. Such technical standards may be set by shared LRs such as parallel corpora and Asian WordNet, Japanese Wordnet and joint projects on cross-language IR, machine translation, and so forth.

The participants of Language Diversity Workshop shall propose multi-stakeholder projects and ideas on how to sustain such regional efforts. Some potential participants:

1) China - JIA Yanmin - Chinese Academy of Sciences
2) China - WU Jian Chinese - Academy of Sciences
3) China - XU Bo - Chinese Academy of Sciences
4) India - Om VIKAS - Indian Institute of Information Technology and Management
5) Indonesia - Hammam Riza - BPPT
6) Indonesia - S. Moedjiono - Ministry of Communication and Information Technology
7) Indonesia - Mirna Adriani - University of Indonesia
8) Japan - HASIDA Koiti - AIST
9) Japan - KODAMA Shigeki - Nagaoka Univ. of Technology
10) Japan - NAKAMURA Satoshi - NiCT
11) Japan - ISAHARA Hitoshi - NiCT
12) Japan - MIKAMI Yoshiki - Nagaoka Univ. of Technology
13) Japan - TOKUNAGA Takenobu - Tokyo Institute of Technology
14) Korea - YOON Juntae - Daumsoft
15) Malaysia - Zaharin Bin Yusoff - MIMOS
16) Mongolia - Purev JAIMAI - National Univ. of Mongolia
17) Myanmar - Thein Oo - Myanmar Computer Federation
18) Myanmar - Wunna Ko Ko - AWZAR Co. Ltd.
18) Nepal - Amar Gurung - Madan Puraskar Pustakalaya
19) Pakistan - Sarmad HUSSAIN - FAST National Univ.
20) Sri Lanka - Ruvan Weerasinghe - Univ. of Colombo School of Computing
21) Sri Lanka - S. T. Nandasara - Univ. of Colombo School of Computing
22) Thailand - Virach Sornlertlamvanich - NICT Asia Research Center
23) Vietnam - HUYNH Quyet Thang - Hanoi Univ. of Technology
24) USA - Heather Simpson - LDC
25) Italy - ALZOLARI Nicoletta - FLaReNet
26) France - Jamel Mustefa - ELDA