NLP #1 : Current state of Natural Language Processing (NLP)

superoo7 (64)in #steemstem • 7 years ago (edited)

In the previous post, I shared about my interest towards NLP. As further I dive into NLP, I realize that I am lack of a certain skill in Data Science, so for the next series onwards, I will take a step back into Data Preprocessing before I dive into any NLP model.

Royal Free image from Pexels

Alright, back to this post. I will talk about the current state of NLP, what NLP is good at, what NLP is struggling at, and why NLP is hard. You can check my previous post where I talk about what my goal of learning NLP.

What NLP is good at?

Natural Language Processing (NLP) started back in 1950s, with the famous Turing test to test how intelligence is the AI; after these few decades of improvements, we had evolved a lot. Escpecially in current state, where the computing power are powerful enough for us to create all kinds of NLP model. However, NLP was not perfect, there are a few fields where they excel at, and some field which required a lot of improvement.

Where do NLP excel?

Based on the current state of NLP, NLP is excel at:

Checking on Spam content. (e.g. GMail use this to detect spam mail back in 2000s)
Parts-Of-Speech (POS) tagging, which assign each words into its syntactic functions (e.g. adjectives, nouns).
Named-Entity recognition (NER), which is used to extract entities from the sentences. (names, organizations, locations)

Where do NLP good at?

NLP can do these also, but they are still improving:

Sentimental analysis, which understand human emotional towards a certain things. (Useful to understand the consumer behaviour, predict stocks of a certain company)
machine translation. (Google translate, the most common translation tool online)
Information extraction. (tags suggestion for blog post)

Which field do NLP required improvement?

NLP is capable to do these, but the result is not satifying.

Machine Conversation. (Just like Siri, Cortana or Google Now, sometimes they don't understand what we say)
Paraphrasing and Summarization. (Summarize an article)

Why NLP is hard?

One of the toughest challenge for NLP is to make all the words into math. However, languages are ambiguous.

ambiguous of language

So, in language, it is quite common that we express things in a few ways. Just look at a simple sentense, "Are you free tomorrow" being analysed in terms of ambiguity. Check this quora question

Something as simple as that, where the word "free" can have different meaning.

Using of informal languages

Alright, let's admit this human are lazy. Therfore, we created a lot of informal words in our daily conversation or messages.

In australia, we would say "gday mate" instead of "good day mate"; in messaging, we will use informal languages like "lol, u, FUD, HODL, netflix and chill", where the computer couldn't understand. (Sometimes we also don't understand)

Therefore, to train the model to understand informal languages is also a challenge.