NLTK (Natural Language Toolkit) Tutorial in Python
What is Natural Language Processing?
Natural Language Processing is manipulation or understanding text or speech by any software or machine. An analogy is that humans interact, understand each other views, and respond with the appropriate answer. In NLP, this interaction, understanding, the response is made by a computer instead of a human.
What is NLTK?
NLTK stands for Natural Language Toolkit. This toolkit is one of the most powerful NLP libraries which contains packages to make machines understand human language and reply to it with an appropriate response. Tokenization, Stemming, Lemmatization, Punctuation, Character count, word count are some of these packages which will be discussed in this tutorial.
Here is what we cover in the Course
# <Tutorial> | NLP (Natural Language Processing) Tutorial: What is, History, Example |
# <Tutorial> | How to Download & Install NLTK on Windows/Mac |
# <Tutorial> | Tokenize Words and Sentences with NLTK |
# <Tutorial> | POS (Part-Of-Speech) Tagging & Chunking with NLTK |
# <Tutorial> | Stemming and Lemmatization with Python NLTK |
# <Tutorial> | WordNet with NLTK: Finding Synonyms for words in Python |
# <Tutorial> | Tagging Problems and Hidden Markov Model |
# <Tutorial> | Counting POS Tags, Frequency Distribution & Collocations in NLTK |
# <Tutorial> | Word Embedding Tutorial: word2vec using Gensim [EXAMPLE] |
# <Tutorial> | seq2seq (Sequence to Sequence) Model for Deep Learning with PyTorch |
Various NLP Libraries
NLP Library | Description |
NLTK | This is one of the most usable and mother of all NLP libraries. |
spaCy | This is completely optimized and highly accurate library widely used in deep learning |
Stanford CoreNLP Python | For client-server based architecture this is a good library in NLTK. This is written in JAVA, but it provides modularity to use it in Python. |
TextBlob | This is an NLP library which works in Pyhton2 and python3. This is used for processing textual data and provide mainly all type of operation in the form of API. |
Gensim | Genism is a robust open source NLP library support in python. This library is highly efficient and scalable. |
Pattern | It is a light-weighted NLP module. This is generally used in Web-mining, crawling or such type of spidering task. p |
Polyglot | For massive multilingual applications, Polyglot is best suitable NLP library. Feature extraction in the way on Identity and Entity. |
PyNLPl | PyNLPI also was known as 'Pineapple' and supports Python. It provides a parser for many data format like FoLiA/Giza/Moses/ARPA/Timbl/CQL. |
Vocabulary | This library is best to get Semantic type information from the given text. |
In this tutorial, we will only discuss one of the most popular NLP library NLTK.
0 Comments