Monday, October 16, 2017

NLP Research Group,CDCSIT, TU,

Natural language processing (NLP) is the automatic understanding and generation of natural language by computer or machine. Since the invention of computer or machine, the idea of human machine interaction was in the quest of research. Many eminent personalities has done a lot of work to bring the NLP to the current stage: from simple language processor such as compiler to the complicated image description and captioning problem. 

The NLP works in the pipeline of different stage: Morphological analysis(Word Segmentation), Lexical Analysis (Lexime/word analysis), syntactic analysis(sentence structure), semantic analysis,  and pragmatic analysis, intermediate form representation and natural language generations( one language to another language).

Natural Language Processing has been one of the most investigated research topic since the decade. The top technological firm such as the Google, Microsoft, Facebook and others have invested much dollar in the research of NLP. The Google has many product such as Google Voice, Google Translator, Google Input Tools which are available in many languages. Among these products, most of them include Nepali Language but not as comparable to other language like English and European Language. However, the research on Nepali language processing is in its fledgling stage i.e. there are many rooms to fill out yet. 

The history of NLP in Nepali date back around 2004 AD when a project named PAN localization project jointly conducted by Madan Puruskar Pustakalay, Kathmandu University and Linguistic department of TU (Now Known as Language Technology Kendra "http://ltk.org.np"). However, after the completion of this project, the work was not continued in  the same speed, systematic and organized manner. The different individuals, organizations and institutions have been doing research in this field and often publishes the result as well, but there is no easily accessible repository of such work in the internet or in any other medium.

During the same period, a few master's degree students, at Central department of computer science and information technology, who has compulsory thesis work in their final semester, wrote their thesis about NLP and related field (I am also one of them). Counting individual thesis, almost all stage of NLP in Nepali Language has been worked out by student as their thesis topic. However, if one would like to built a NLP based application or try to investigate Nepali Text analysis such as Text summarization, classification, Information Retrieval and so. on, he/she can't find the necessary pre-processed resources, niether on internet nor in any other medium. This is the result of not making a single repository in a accessible medium such as GitHub or Code-Repository. The same situation is prevailing in other institutions as well.

I have realized this situation only when I come back to the same department after 4-5 years, joining as Lecturer of CS and tried to continue my research on text analysis. I couldn't access even a simple Nepali Stemmer which is first and foremost step for any task related to NLP. 

Now Its time to make a systematic and organized repository of our work and make it accessible to all other, anywhere in the world. For this, we have initiated a Online NLP research group, where anyone interest in NLP research can take part, contribute and be a proud member of the group.

If you are interested please join us as a member in GitHub, Google Group, Facebook and Twitter. The link are as follows:

Github: https://github.com/tucdcsit
Gmail: cdcsittu@gmail.com
Twitter: Not yet created
Facebook:Not yet created

With Best Regards
Tej Bahadur Shahi (Lecturer, CDCSIT), Member
Ashok Kumar Pant (Sr. Machine Learning Engineer), Member


Wednesday, September 13, 2017

GRE Frequent Word-3

warrant /n/- the legal document form the court that authority can arrest someone.
unwarranted/adj/- not justified (without any reason)
esteem /n/:- Great respect
justified/adj/- with reasons
Feigned /adj/- fake eg. feigned interest
disguised/adj/- real eg disguised interest (unfeigned)
undue/adj/- sufficiently large(excessive) eg. undue praise/critisim
contempt/n/
           - dislike eg. unfeigned contempt.
           -contempt of court
Introverted/adj/
           -not willing to speak more eg. introverted natures.
fondness/n/
            -strong liking eg. Bush's fondness for travel is well known.
ceaseless/adj/
             -not stoppable(continuous) eg. ceaseless explorations
disdain/n/
            - contempt eg disdain for other art.
disdainful/adj/
            -disliking (thinking that they don't deserve your interest).
deceive /v/
      -keep the truth hidden form other for your own advantages eg deceive yourself.
wanderings/n/
       - Time spent travelling around eg. wandering around America
Self-proclaimed/adj/
        -self declare eg Devkota was a self-proclaimed poet of the people.

preconceptions/n/
       -concept before experience eg. The preconception about earthquake was wrong.
At large/phr./
       - as a whole/of many different subject/out of capture eg. Criminal was at large.
Rambling and unconstrained /phr./
      - Very broad
Rambling/adj/
      -too long and confusing
Forays/n/
      - A journey in order to explore
      -Raid/First attampt
Realm/n/
      -an area of interest
Aphoristic/adj/
    -- containing truth
Tantamount/n/
     -having the same effect as st. eg her refusal to answer is tantamount to guilt.
Exert/v/
     -to use st. such as power or authority in order to happen st.
Reclusive/adj/
     - preferring isolation ( eg. reclusive person)

Good Luck to Your GRE preparation... See you next time..




Saturday, August 19, 2017

Data Mining and Data warehousing: Issues and Challenge


What is data?

A representation of facts, concepts, or instructions in a formal manner suitable for communication, interpretation, or processing by human beings or by computers

What is Data Mining?

Word Meaning
·         the practice of examining large databases in order to generate new information.
Definition:
·         Art/Science of extracting non-trivial, implicit, previously unknown, valuable, and potentially useful information from a large database
 

An Introduction to Data mining and data warehousing can be found here

Wednesday, August 9, 2017

Distributed System: A shared Approach

We are living in the age of information. Information has been expanded exponentially. This is causing some really challenging hardship to IT industry as well as academia. We need new techniques and tools to manage large  volume of data, Big data. Form this prospective, the concept of distributed system were emerged. The primary objective of this concept was to reduce the computation time on large amount of data and sharing of resources. Later on the concept of grid computing , more specialized form of distributed computing became buzzword in IT field. At present these concept were merged into new concept such as big data, machine learning and cloud computing. NO body want bye a mainframe or rack server, instead they bye  or leased required resource on cloud infrastructure managed by AMAZON or Microsoft AZURE. It cost minimum as well as required no time for set up and configurations.

On this regards, I have prepared a small presentation on Introduction to distributed system, lecture given at CDCSIT, TU, First Semester. The power point slide can be reached  here. The reference taken  in preparing these slide was the text book " operating system concept" by  Abraham Silberschatz.

Saturday, August 5, 2017

Recommendation system: A collaborative approach

A recommendation system analyze your preference and automatically suggest you the similar item/product that you may be interested in.
For example Movie recommendation system such as NETFLIX , Product recommendation system such as AMAZON and so. on.

In this tutorial, I will be talking about food items recommendation system using collaborative approach with example.


Collaborative approach  uses the idea of collaboration between users preference and finding the similarity between users preference. These measurement are used as a recommendation criteria.

This can be illustrate with following figure:
 Image result for collaborative filletering

Lets consider 5 items and 4 users with following ratings:



Items
Ram
Shyam
Hari
Gopal
MoMO
5
0
0
0
PIZZA
5
0
?
0
BIRIYANI
?
2
5
?
NODDLES
0
4
0
4
CHICKEN ROLL
0
5
0
?

By intuition, we can find the dissimilarity between User "Ram" and "Shyam" then the other users. So we need an algorithm to find the similarity between users so that this similarity criteria can be used for the recommendation propose. Lets say ram(5,5, ?,0,0) is a  rating vector for user ram and shyam(0,0,2,4,5) is for shyam ans so.on.

Similarity measure by Jecard Distance: The Jeccard distance between vector A and Vector B is defined as:
sim(A,B)=  |A intersection B|/ |A union B|
For Example: the jeccard distance between Ram and Shyam is =4/9 i.e common items divided by total item rated.
 Here the problem with this approach is that it ignore the value of rating and just consider whether the rating is present or not.

The next option is cosine similarity. it is defined as
cos(A,B) =A.B/|A|.|B|
Now the cosine similarity between ram and sham is :
Ram =(5,5, ?,0,0)=(5,5, 0,0,0) here feel the unknown rating by zero.
Syam=(0,0,2,4,5)
Now sim(Ram, Shyam)=(5x0+5x0+0x2+0x4+0x5)+/sqrt(5x5+5x5) xsqrt(2x2+4x4+5x5) =0
i.e they are opposite to each other. it gives better estimation of similarity then jeccard in case of rating values. There are many improvements on cosine similarity such as centered cosine or Pearson correlations and so.on.

Rating Predication
suppose we want to predict the rating of user x to item i, then we select the N-most similar user to x who also have rated item i and then can take a average of rating of this item i by these N user as a rating for item i by user x. This is very simple approach. The other approach is to rake weighted average.

The approach explained above is user based collaborative filtering. Now the another version of collaborative filtering is item based approach, Here this is very similar to user based. We need to find the similarity between item to item and then predict the rating of item i to user x.

Thursday, August 3, 2017

Frequent GRE words (Version 1)

Basic Words
1.      Escalates/v/: Become great worse or more serious. eg. His financial problems escalated after he become unemployed.
2.      Revise /v/: To look again to improve it. eg:  I was asked to revise my proposal before submitting it again.
3.      Plummet/v/: to decline sharply eg: plummet supply
4.      Augment/v/:-
a.       to increase eg the price augmented.
b.      to uplift eg the seat was augmented.
c.       to improve or make better eg the meal was augmented.
d.      to enlarge eg: The photograph is augmented.
5.      Soar/v/:- rocket or sky-rocket eg soaring price (figurative)
6.      jeopardize /v/: to endanger eg. jeopardize culture/ By failing her finals, teacher jeopardize his whole future.
7.      Composure: mental calmness. eg a composed pilot.
a.       /opp/ discomposure
8.      Abstract /n/
a.       summary eg the abstract of story
b.      Not representing a particular figure eg abstract art
c.       Not concrete eg abstract idea.
9.      Archaic /n/  :
a.       very old or outdated eg the archaic meaning of word.
10.  Perturb /v/: to worry someone
11.  ephemeral/adj/: transient/fugitive/ momentary/fleeting/evanescent
12.  Momentous/adj/ long lasting
13.  momentum /n/- encouragement/stimulus /incentive /drive/impetus
14.  Momentary /adj/- transient/ ephemeral
15.  Drivel /v/- to talk nonsense or nonsense
16.  Gainsaid /V/- to contradict or to deny.
17.  immutable /adj/- unchangeable system
18.  Specious /adj/- seemingly true but false
19.  spacious /adj/- having comfortable space
20.  Erratic /adj/-Fluctuating eg erratic oil price.
21.  Discrete /adj/.- Secret and sensible so that other may not realize and notice eg journalist was following the prime minister in a discrete manner.
22.  Surreptitious /adj/- secret so that other may not notice. eg surreptitious glance.
/syn/- furtive eg they exchange furtive smile.
23.  implicit /adj/- hidden
24.  Explicit /adj/- clearly stated.
25.  Explicate /v/ : explain with logic. explicable problem.
26.  boast /v/:-
a.       Brag eg don’t brag
b.      have (matter of pride) eg Nepal boast Mt. Everest.
c.       To express st. with energy and pride eg he boasted he won the match.
27.  lackadaisical /adj/- with no energy or enthusiasm.
28.  Garrulous /adj/ -Having a habbit of talking a lot in unimportant matter.
29.  talkative:- willing to talk a lot.
30.  skeptic /adj/- non believer/doubt the truth.







Research Paper: Reading and Review



Re-Search

Research is a systematic inquiry that investigates hypotheses, suggests new interpretations of data or texts, and poses new questions for future research to explore.

Research consists of:

Asking a question that nobody has asked before.
Doing the necessary work to find the answer;
Communicating the knowledge you have acquired to a larger audience.

Types of Research
Qualitative: interviews, surveys, and observation.
Quantitative: objective measurement and quantitative analysis (statistics).
Correlation/Regression Analysis
Experimental:
And so on.


A research paper is not
a simply an informed summary of a topic by means of primary and secondary sources.
a book report
an opinion piece
an expository essay consisting solely of one's interpretation of a text
an overview of a particular topic

The rest of the presentation  can be found here