Skip to content

A supervised deep learning project using Natural Language Processing(NLP) that classifies a given SMS as a spam/ham message.

License

Notifications You must be signed in to change notification settings

A-SOLO/SMS_Spam_Detection_using_NLP

Repository files navigation

SMS_Spam_Detection_using_NLP

A supervised deep learning project using Natural Language Processing(NLP) that classifies a given SMS as a spam/ham message.

spam_ham_5

Dependencies/Libraries used:

  • Pandas
  • Numpy
  • Seaborn
  • Sklearn
  • NLTK
  • Pickle
  • Streamlit

Input data:

Dataset - https://www.kaggle.com/datasets/bagavathypriya/spam-ham-dataset

Building Model:

  • Imbalanced Data

spam_ham_M1

  • Visualizing the trend of number of characters, number of words and number of sentences in a spam/ ham SMS

spam_ham_M2

  • Plotting the trend as a Heatmap

spam_ham_M3

  • Most common words in a Spam Corpus

spam_ham_M4

  • Performance of Various Models

spam_ham_M5

Results:

  • Applied NLP techniques like tokenization, lemmatization, and stop words and punctuation removal using NLTK and regex.
  • Performed feature engineering with handcrafted features such as digit count and email length.
  • Implemented various classification models, Naive Bayes, SVC, ETC to find the best performer.
  • Created an ensemble model, improving the accuracy from 97.87% to 98.25%.
  • Designed & deployed a basic UI with Streamlit for classifying new inputs as spam or ham.

Top Performing Model

Combined Support Vector Classifier, Multinomial Naive Bayes and Extra Trees Classifier to build an Stacking Classifier Model( Ensemble Model) whose:

  • Accuracy: 0.9825918762088974

  • Precision: 0.9736842105263158

Testing Examples:

  • Example 1:spam_ham_2

  • Example 2:spam_ham_4

  • Example 3:spam_ham_3

About

A supervised deep learning project using Natural Language Processing(NLP) that classifies a given SMS as a spam/ham message.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published