A supervised deep learning project using Natural Language Processing(NLP) that classifies a given SMS as a spam/ham message.
- Pandas
- Numpy
- Seaborn
- Sklearn
- NLTK
- Pickle
- Streamlit
Dataset - https://www.kaggle.com/datasets/bagavathypriya/spam-ham-dataset
- Applied NLP techniques like tokenization, lemmatization, and stop words and punctuation removal using NLTK and regex.
- Performed feature engineering with handcrafted features such as digit count and email length.
- Implemented various classification models, Naive Bayes, SVC, ETC to find the best performer.
- Created an ensemble model, improving the accuracy from 97.87% to 98.25%.
- Designed & deployed a basic UI with Streamlit for classifying new inputs as spam or ham.
Combined Support Vector Classifier, Multinomial Naive Bayes and Extra Trees Classifier to build an Stacking Classifier Model( Ensemble Model) whose: