Data Science Mini Project : NLP on Reviews – Cleanup Pt. 2

This series of projects is your first steps to deriving insights from the gold mine of data that is text, and your first foray in NLP. Learn further ways to clean and pre-process text data. This is part 3 of the series.

Project Features

Good for Intermediate Level
100 points for Enrolling in Project
500 points for Submitting Solution

5/5 (1) 7+ Enrolled Learners
2 Lessons

Project Problem Statement

So far, you loaded the data, made wordclouds. You realised that clean up was needed, and rightly so.  In the previous microproject you did some preprocessing to break the sentences into individual words, and you have one consolidated big list with all the words. In this microproject, you will perform clean up on the words.   Clean up is the most important step in any NLP project. The two main tasks for you in this first pre-processing part are -
  1. Remove the punctuation and numbers from the data
  2. Normalize the case (convert everything to lowercase)
  Submit your solution as a py file or Jupyter notebook. Make sure to provide your insights as comments/markdown in the code.


  1. Remove everything that is not a letter, regex will make this easy
We would like you to try it out first on your own. You will get solution of project after 1 week of enrollment. All the best!

Please rate this