If you are a student or a professional looking for various open-source NLP projects, then, this article is here to help you.

Natural Language Processing (NLP) is the branch of Artificial Intelligence that helps computers to learn and communicate using human language. Moreover, with the growth of the online community, open-source NLP projects have started being implemented in various fields. Therefore, NLP has aided in the massive growth of AI in the last few years with the development of intelligent systems and robots.

Here is a list of open-source NLP projects that can be implemented using Python.

Natural Language Processing (NLP) Data Science Projects

1. Text Summarizer – Video Tutorial, Github Code

Text Summarizer is a project that can summarize long paragraphs of text into a single line summary. It can turn an article into a summary using Python and Keras library. Moreover, the project builds concepts of NLP including word embeddings and encoder-decoder and is fairly easy to understand. Have a look at this tutorial on creating a basic Text summarizer in Python using the gensim library.

Full code for Text Summarization in Python:

# import the gensim module and summarize function
from gensim.summarization.summarizer import summarize 

# Paragraph
paragraph = "Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken. NLP is a component of artificial intelligence (AI). The development of NLP applications is challenging because computers traditionally require humans to 'speak' to them in a programming language that is precise, unambiguous and highly structured, or through a limited number of clearly enunciated voice commands. Human speech, however, is not always precise -- it is often ambiguous and the linguistic structure can depend on many complex variables, including slang, regional dialects and social context."

# Get the Summary of the text based on percentage (0.5% of the original content). 
summ_per = summarize(paragraph, ratio = 0.4) 
print("Percent summary:") 
print(summ_per) 

# Get the summary of the text based on number of words (50 words) 
summ_words = summarize(paragraph, word_count = 50) 
print("\n")
print("Word count summary:") 
print(summ_words) 

2. Personal Voice Assistant – Tutorial, Github Code, Video Tutorial

We are quite familiar with the term Voice Assistant with the rise of devices from Apple (Siri), Google (Google Assistant), Amazon (Alexa), Microsoft (Cortana), and many more. Voice assistants are able to perform tasks or services as per the user’s commands.

Have a look at this tutorial to create Jarvis in Python, which is an open-source voice assistant available for Linux, macOS, and Windows as a command-line tool. Jarvis can be easily built in Python and used for voice-based commands in Desktop. The following tutorial walks through commands such as “Open Reddit”, “Open website”, “Current weather”, “Send email”, etc.

Full Code for creating voice assistant in Python:

from gtts import gTTS
import speech_recognition as sr
import os
import re
import webbrowser
import smtplib
import requests
from weather import Weather

def talkToMe(audio):
    "speaks audio passed as argument"

    print(audio)
    for line in audio.splitlines():
        os.system("say " + audio)

    #  use the system's inbuilt say command instead of mpg123
    #  text_to_speech = gTTS(text=audio, lang='en')
    #  text_to_speech.save('audio.mp3')
    #  os.system('mpg123 audio.mp3')


def myCommand():
    "listens for commands"

    r = sr.Recognizer()

    with sr.Microphone() as source:
        print('Ready...')
        r.pause_threshold = 1
        r.adjust_for_ambient_noise(source, duration=1)
        audio = r.listen(source)

    try:
        command = r.recognize_google(audio).lower()
        print('You said: ' + command + '\n')

    #loop back to continue to listen for commands if unrecognizable speech is received
    except sr.UnknownValueError:
        print('Your last command couldn\'t be heard')
        command = myCommand();

    return command


def assistant(command):
    "if statements for executing commands"

    if 'open reddit' in command:
        reg_ex = re.search('open reddit (.*)', command)
        url = 'https://www.reddit.com/'
        if reg_ex:
            subreddit = reg_ex.group(1)
            url = url + 'r/' + subreddit
        webbrowser.open(url)
        print('Done!')

    elif 'open website' in command:
        reg_ex = re.search('open website (.+)', command)
        if reg_ex:
            domain = reg_ex.group(1)
            url = 'https://www.' + domain
            webbrowser.open(url)
            print('Done!')
        else:
            pass

    elif 'what\'s up' in command:
        talkToMe('Just doing my thing')
    elif 'joke' in command:
        res = requests.get(
                'https://icanhazdadjoke.com/',
                headers={"Accept":"application/json"}
                )
        if res.status_code == requests.codes.ok:
            talkToMe(str(res.json()['joke']))
        else:
            talkToMe('oops!I ran out of jokes')

    elif 'current weather in' in command:
        reg_ex = re.search('current weather in (.*)', command)
        if reg_ex:
            city = reg_ex.group(1)
            weather = Weather()
            location = weather.lookup_by_location(city)
            condition = location.condition()
            talkToMe('The Current weather in %s is %s The tempeture is %.1f degree' % (city, condition.text(), (int(condition.temp())-32)/1.8))

    elif 'weather forecast in' in command:
        reg_ex = re.search('weather forecast in (.*)', command)
        if reg_ex:
            city = reg_ex.group(1)
            weather = Weather()
            location = weather.lookup_by_location(city)
            forecasts = location.forecast()
            for i in range(0,3):
                talkToMe('On %s will it %s. The maximum temperture will be %.1f degree.'
                         'The lowest temperature will be %.1f degrees.' % (forecasts[i].date(), forecasts[i].text(), (int(forecasts[i].high())-32)/1.8, (int(forecasts[i].low())-32)/1.8))


    elif 'email' in command:
        talkToMe('Who is the recipient?')
        recipient = myCommand()

        if 'John' in recipient:
            talkToMe('What should I say?')
            content = myCommand()

            #init gmail SMTP
            mail = smtplib.SMTP('smtp.gmail.com', 587)

            #identify to server
            mail.ehlo()

            #encrypt session
            mail.starttls()

            #login
            mail.login('username', 'password')

            #send message
            mail.sendmail('John Fisher', '[email protected]', content)

            #end mail connection
            mail.close()

            talkToMe('Email sent.')

        else:
            talkToMe('I don\'t know what you mean!')


talkToMe('I am ready for your command')

#loop to continue executing multiple commands
while True:
    assistant(myCommand())

3. Automated Keyword Extraction – Tutorial, Github Code, Video Tutorial

Keywords are an integral part of any article. They play a crucial role in page ranking systems and categorization algorithms in search engines. The purpose of this project is to identify keywords from a paragraph text. This short tutorial on automatic keyword extraction using gensim library in Python is quick and easy to learn.

Full code for Keyword Extraction in Python:

# import the gensim module and keywords function
from gensim.summarization import keywords

# Paragraph
paragraph = "Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken. NLP is a component of artificial intelligence (AI). The development of NLP applications is challenging because computers traditionally require humans to 'speak' to them in a programming language that is precise, unambiguous and highly structured, or through a limited number of clearly enunciated voice commands. Human speech, however, is not always precise -- it is often ambiguous and the linguistic structure can depend on many complex variables, including slang, regional dialects and social context."

# Get the keywords from the paragraph 
keywords_txt = keywords(paragraph) 
print(keywords_txt) 

4. Sentiment Analysis – Tutorial, Github Code, Video Tutorial

Sentiment analysis involves the identification and categorization of sentences into positive or negative. This categorization of sentences or opinions is used to determine one’s attitude towards writing the sentence. Analysis of Twitter tweets is the best way to get started with Sentiment Analysis in Python.

This video by Siraj Raval on Twitter Sentiment Analysis is quick and easy to learn.

5. Topics Identification – Tutorial, Github code, Video Tutorial

Topic identification is a type of multi-label classification project that can identify topics from articles. Moreover, it can be used to classify articles of magazines, newspapers, etc. from their headlines/titles. It uses an automated algorithm that can read through the text documents and automatically output the topics discussed. In addition, it has been extensively used in email segmentation in the inbox.

Watch this extensive video tutorial explaining all the concepts required for topic identification in Python.

6. Multilabel text classification – Tutorial, Github Code, Video Tutorial

The purpose of the project is to create a multi-label text classification system that automatically assigns tags for questions posted on a forum such as Stackoverflow or Quora. Moreover, it can be used to classify articles in newspapers and other documents as well.

Watch this tutorial series on Text Classification in Keras to make a news classifier in Python.

Text Classification in Keras (Part 2)

7. Sentence to Sentence Semantic Similarity – Tutorial (gensim), Video Tutorial

The semantic similarity of sentences is defined as the measure of how similar the meaning of the two sentences is. In other words, it defines the measure of sentences with the same intent. This video tutorial on finding the semantic similarity between two sentences uses spaCy module in Python.

Full Code for finding semantic similarity between sentences using spaCy in Python:

# Python code to measure similarity between two sentences using cosine similarity. 
import spacy

nlp = spacy.load("en")

# Sentences
s1 = nlp("The weather is rainy.")
s2 = nlp("It is going to rain outside.")

# Calculate the similarity
print("The similarity is:",s1.similarity(s2))

8. Inference-based Chatbot system – Tutorial, Github Code, Video Tutorial

Chatbots are in a rise for several use-cases in the hospitality and service industry, be it assistants for your device, or waiters at the restaurant. In addition, intelligent chatbots have been increasingly adopted for customer service in several sectors. Some common examples are Siri, Alexa, Google Assistant, etc. that are being used are voice assistants as well as chatbots.

This tutorial series by Sentdex uses Tensorflow to create a conversational chatbot in Python.

In Conclusion

How many of the above projects have you tried? Do you have any recommendations for us to include in the above list? Let us know.

Also, if you are trying to start or advance your career in the field of Computer Vision, you might like this article on “Open-Source Computer Vision Projects (With Tutorials)“.

2 COMMENTS

    • We suggest choosing a project out of the listed ones in the article and working on one that you feel most comfortable with.

LEAVE A REPLY

Please enter your comment!
Please enter your name here