Skip to content

Custom implementations of classification algorithms, including k-nearest neighbors for binary and multi-class problems, a perceptron for binary classification, and a one-vs-rest strategy for extending the perceptron to multi-class scenarios.

Notifications You must be signed in to change notification settings

JGZimek/classification-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classification Algorithms

Overview

This repository contains Python implementations of several classical classification algorithms along with utility scripts for running experiments and generating visualizations. The focus is on a minimal yet educational approach to algorithms such as k-nearest neighbors, a perceptron classifier, and a one-vs-rest strategy for handling multi-class problems. An additional script demonstrates how these implementations compare with ensemble models from scikit-learn.

Project Structure

  • /src: Library modules used by the example scripts.
    • config.py: Paths and default hyperparameters.
    • data/: Dataset loading utilities.
    • models/: Implementations of KNNClassifier, Perceptron, and OneVsRestClassifier.
    • utils/: Simple evaluation metrics (accuracy, precision, recall, F1) and a confusion matrix helper.
    • visualization/: Functions for plotting results and ensuring output directories exist.
  • /scripts: Command line scripts illustrating how to train and evaluate the models.
    • run_knn.py – optimize k and p for k-NN, then evaluate on the wine dataset.
    • run_perceptron.py – tune the learning rate of a perceptron using the banknote dataset.
    • run_one_vs_rest.py – apply a perceptron in a one-vs-rest setup for the wine dataset.
    • run_ensemble.py – compare the custom models with several scikit-learn ensemble methods.
  • /data: Contains the raw datasets used by the scripts (e.g. data_banknote_authentication.csv).
  • /docs: Output directories for figures generated by the example runs.
  • README.md: This file.

Getting Started

  1. Clone the repository

    git clone <repo-url>
    cd classification-algorithms
  2. Install dependencies

    Create a virtual environment and install the required packages:

    python -m venv venv
    source venv/bin/activate
    pip install numpy pandas matplotlib seaborn scikit-learn
  3. Run the example scripts

    Each script can be executed directly. Results (plots and printed metrics) will be saved under docs/.

    python scripts/run_knn.py
    python scripts/run_perceptron.py
    python scripts/run_one_vs_rest.py
    python scripts/run_ensemble.py

Usage

The scripts are meant as demonstrations of the provided algorithms. They perform typical data loading, preprocessing and evaluation steps.

  • KNN: loads the wine dataset, splits it into training/validation/test parts, searches for the best k and Minkowski p value, then plots metrics and a t-SNE visualization.
  • Perceptron: uses the banknote authentication dataset and sweeps the learning rate to find the best model.
  • One-vs-Rest: wraps the perceptron for multi-class classification on the wine dataset and shows macro and micro averaged metrics.
  • Ensemble: compares the custom models with scikit-learn random forest, bagging, gradient boosting and others.

Generated plots are stored in the corresponding docs/task_*_results folder.

Documentation

Models

  • KNNClassifier
    • fit(X, y) – memorize the training data.
    • predict(X) – return predicted labels for new samples.
    • optimize_k and optimize_p – helper functions to evaluate different hyperparameters.
  • Perceptron
    • fit(X, y) – train weights using a simple gradient update rule.
    • predict and predict_proba – produce class labels or raw scores.
    • optimize_learning_rate – sweep over a range of learning rates.
  • OneVsRestClassifier – trains one binary perceptron per class and selects the class with the highest score when predicting.

Utilities

  • Metrics: accuracy, precision, recall, F1 and confusion matrix implemented with NumPy.
  • Visualization: plotting functions for pairplots, t-SNE embeddings, confusion matrices and basic metric tables.

Testing

The repository currently does not include automated unit tests. Running the example scripts serves as an integration test of the modules.

Future Extensions

  • Additional algorithms such as logistic regression or SVM.
  • More extensive visualizations and reporting utilities.
  • Unit tests and continuous integration configuration.

Contact

For questions feel free to open an issue or reach out to the repository owner.

About

Custom implementations of classification algorithms, including k-nearest neighbors for binary and multi-class problems, a perceptron for binary classification, and a one-vs-rest strategy for extending the perceptron to multi-class scenarios.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages