Day 112(DL) — NLP Data Preprocessing — Part 2

This post is the continuation of the Preprocessing steps followed while dealing with NLP use cases.

  • Usage of wordnet library
  • Lemmatization

Usage of wordnet library: Wordnet library can be considered as a dictionary of synonyms. We can leverage this python package to validate whether the word is a proper English word. If it is a junk token, then it can be ignored thus reducing the critical word count.

Let’s consider a sample text,

Text = 'skype problem Requires approval erp details'

we can say the words ‘skype’ and ‘erp’ are non-English words and more like technical terms. Now, we can use the wordnet and verify whether the given word belongs to the English language or not.

import nltknltk.download('wordnet')
from nltk.corpus import wordnet
def check_english(x):
nonenglish_list = []
for word in x.split(' '):
if not wordnet.synsets(word):
print(word)
nonenglish_list.append(word)
return ' '.join(nonenglish_list)

The command wordnet.synsets checks for validity. When the above set of lines are executed, we could notice the words Skype & Erp gets captured in the non-English list of words.

check_english(Text)skype
erp

As per the expectation, the two words get printed in the output.

Lemmatization: As we’ve already discussed the lemmatization process fetches the root word thus downsizing the word count by retaining only the distinct ones. Similar to nltk, we have another package to deal with NLP requirements (i.e) Spacy. For this experiment, we’ll incorporate the lemmatization functionality from the Spacy package.

For instance, the input text = “sometimes i wonder walking is better than driving”

#use spacy lemmatizerimport spacy
nlp = spacy.load('en')
def spacy_lema(x):
doc = nlp(x)
return ' '.join([token.lemma_ for token in doc])
spacy_lema(Text)sometimes i wonder walk be well than drive

If we observe closely, wondering & driving have been replaced with the respective root words wonder and drive.

The entire code can be found in the Github repository.

Recommended Reading:

AI Enthusiast | Blogger✍

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store