Monday, October 16, 2017

NLP Research Group,CDCSIT, TU,

Natural language processing (NLP) is the automatic understanding and generation of natural language by computer or machine. Since the invention of computer or machine, the idea of human machine interaction was in the quest of research. Many eminent personalities has done a lot of work to bring the NLP to the current stage: from simple language processor such as compiler to the complicated image description and captioning problem. 

The NLP works in the pipeline of different stage: Morphological analysis(Word Segmentation), Lexical Analysis (Lexime/word analysis), syntactic analysis(sentence structure), semantic analysis,  and pragmatic analysis, intermediate form representation and natural language generations( one language to another language).

Natural Language Processing has been one of the most investigated research topic since the decade. The top technological firm such as the Google, Microsoft, Facebook and others have invested much dollar in the research of NLP. The Google has many product such as Google Voice, Google Translator, Google Input Tools which are available in many languages. Among these products, most of them include Nepali Language but not as comparable to other language like English and European Language. However, the research on Nepali language processing is in its fledgling stage i.e. there are many rooms to fill out yet. 

The history of NLP in Nepali date back around 2004 AD when a project named PAN localization project jointly conducted by Madan Puruskar Pustakalay, Kathmandu University and Linguistic department of TU (Now Known as Language Technology Kendra "http://ltk.org.np"). However, after the completion of this project, the work was not continued in  the same speed, systematic and organized manner. The different individuals, organizations and institutions have been doing research in this field and often publishes the result as well, but there is no easily accessible repository of such work in the internet or in any other medium.

During the same period, a few master's degree students, at Central department of computer science and information technology, who has compulsory thesis work in their final semester, wrote their thesis about NLP and related field (I am also one of them). Counting individual thesis, almost all stage of NLP in Nepali Language has been worked out by student as their thesis topic. However, if one would like to built a NLP based application or try to investigate Nepali Text analysis such as Text summarization, classification, Information Retrieval and so. on, he/she can't find the necessary pre-processed resources, niether on internet nor in any other medium. This is the result of not making a single repository in a accessible medium such as GitHub or Code-Repository. The same situation is prevailing in other institutions as well.

I have realized this situation only when I come back to the same department after 4-5 years, joining as Lecturer of CS and tried to continue my research on text analysis. I couldn't access even a simple Nepali Stemmer which is first and foremost step for any task related to NLP. 

Now Its time to make a systematic and organized repository of our work and make it accessible to all other, anywhere in the world. For this, we have initiated a Online NLP research group, where anyone interest in NLP research can take part, contribute and be a proud member of the group.

If you are interested please join us as a member in GitHub, Google Group, Facebook and Twitter. The link are as follows:

Github: https://github.com/tucdcsit
Gmail: cdcsittu@gmail.com
Twitter: Not yet created
Facebook:Not yet created

With Best Regards
Tej Bahadur Shahi (Lecturer, CDCSIT), Member
Ashok Kumar Pant (Sr. Machine Learning Engineer), Member