ML Workflow with Filecoin & Lighthouse Storage

This project demonstrates a complete machine learning workflow leveraging decentralized storage through Filecoin and Lighthouse Storage. It showcases how to store datasets and trained ML models on the decentralized Filecoin network, providing persistence, availability, and verifiability for machine learning assets.

Overview

The workflow includes the following steps:

Dataset Creation & Upload: Generates a synthetic diabetes prediction dataset and uploads it to Filecoin via Lighthouse
Model Training: Downloads the dataset from Filecoin, trains a Random Forest classifier model
Model Storage: Uploads the trained model and its metadata to Filecoin
Model Inference: Downloads the model and performs predictions on sample data

Prerequisites

Python 3.8+

Required Python packages:

pip install lighthouseweb3 scikit-learn pandas numpy joblib requests

A Lighthouse Storage API key (sign up at lighthouse.storage)

Installation

Clone this repository
Replace the LIGHTHOUSE_API_KEY variable in the script with your actual Lighthouse API key

Usage

Simply run the script:

python modelTrainingDemo.py

Workflow Details

1. Dataset Management

The script first creates a synthetic diabetes prediction dataset with features like age, BMI, blood pressure, and glucose levels. It then uploads this dataset to Filecoin via Lighthouse Storage, generating a Content Identifier (CID) that can be used to retrieve the dataset later.

2. Model Training

The script downloads the dataset from Filecoin using its CID, then trains a Random Forest classifier to predict diabetes based on the features. The model's performance is evaluated using accuracy metrics.

3. Model Storage

Both the trained model (serialized with joblib) and a JSON metadata file containing information about the model's architecture, features, and performance are uploaded to Filecoin. This creates permanent, verifiable storage for your ML assets.

4. Model Inference

Finally, the script demonstrates how to download the model from Filecoin and use it to make predictions on new data. This shows the complete lifecycle of ML assets on decentralized storage.

Benefits of Using Filecoin for ML Workflows

Persistence: Models and datasets are stored on a decentralized network designed for long-term storage
Verifiability: Content addressing ensures data integrity
Availability: Data can be accessed from anywhere through IPFS gateways
Censorship Resistance: No single entity controls access to your ML assets

Security Note

The example includes an API key for demonstration purposes. In a real implementation, you should:

Never hardcode API keys in your source code
Use environment variables or a secure configuration management system
Consider setting up access controls for your uploaded ML assets

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
modelTrainingDemo.py		modelTrainingDemo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Workflow with Filecoin & Lighthouse Storage

Overview

Prerequisites

Installation

Usage

Workflow Details

1. Dataset Management

2. Model Training

3. Model Storage

4. Model Inference

Benefits of Using Filecoin for ML Workflows

Security Note

Contributing

About

Releases

Packages

Languages

FIL-Builders/model-filecoin-workflow

Folders and files

Latest commit

History

Repository files navigation

ML Workflow with Filecoin & Lighthouse Storage

Overview

Prerequisites

Installation

Usage

Workflow Details

1. Dataset Management

2. Model Training

3. Model Storage

4. Model Inference

Benefits of Using Filecoin for ML Workflows

Security Note

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages