SMS_Spam_Detection_using_NLP

A supervised deep learning project using Natural Language Processing(NLP) that classifies a given SMS as a spam/ham message.

Dependencies/Libraries used:

Pandas
Numpy
Seaborn
Sklearn
NLTK
Pickle
Streamlit

Input data:

Dataset - https://www.kaggle.com/datasets/bagavathypriya/spam-ham-dataset

Building Model:

Imbalanced Data

Visualizing the trend of number of characters, number of words and number of sentences in a spam/ ham SMS

Plotting the trend as a Heatmap

Most common words in a Spam Corpus

Performance of Various Models

Results:

Applied NLP techniques like tokenization, lemmatization, and stop words and punctuation removal using NLTK and regex.
Performed feature engineering with handcrafted features such as digit count and email length.
Implemented various classification models, Naive Bayes, SVC, ETC to find the best performer.
Created an ensemble model, improving the accuracy from 97.87% to 98.25%.
Designed & deployed a basic UI with Streamlit for classifying new inputs as spam or ham.

Top Performing Model

Combined Support Vector Classifier, Multinomial Naive Bayes and Extra Trees Classifier to build an Stacking Classifier Model( Ensemble Model) whose:

Accuracy: 0.9825918762088974
Precision: 0.9736842105263158

Testing Examples:

Example 1:
Example 2:
Example 3:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
SMS_Spam_Detection.ipynb		SMS_Spam_Detection.ipynb
app.py		app.py
model.pkl		model.pkl
nltk.txt		nltk.txt
requirements.txt		requirements.txt
spamhamdata.csv		spamhamdata.csv
vectorizer.pkl		vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMS_Spam_Detection_using_NLP

Dependencies/Libraries used:

Input data:

Building Model:

Imbalanced Data

Visualizing the trend of number of characters, number of words and number of sentences in a spam/ ham SMS

Plotting the trend as a Heatmap

Most common words in a Spam Corpus

Performance of Various Models

Results:

Top Performing Model

Accuracy: 0.9825918762088974

Precision: 0.9736842105263158

Testing Examples:

About

Releases

Packages

Languages

License

A-SOLO/SMS_Spam_Detection_using_NLP

Folders and files

Latest commit

History

Repository files navigation

SMS_Spam_Detection_using_NLP

Dependencies/Libraries used:

Input data:

Building Model:

Imbalanced Data

Visualizing the trend of number of characters, number of words and number of sentences in a spam/ ham SMS

Plotting the trend as a Heatmap

Most common words in a Spam Corpus

Performance of Various Models

Results:

Top Performing Model

Accuracy: 0.9825918762088974

Precision: 0.9736842105263158

Testing Examples:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages