A Doctor for your data
-
Updated
Jan 14, 2025 - Python
A Doctor for your data
A curated, but incomplete, list of data-centric AI resources.
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
Contains implementations of data-centric approaches for improving semantic segmentation on satellite imagery.
A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective
Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)
Code for our paper "Towards Trustworthy Dataset Distillation" (Pattern Recognition 2025)
Enhancing Efficiency in Multidevice Federated Learning through Data Selection
TRIAGE: Characterizing and auditing training data for improved regression (NeurIPS 2023)
Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)
You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling
A multi-view panorama of Data-Centric AI: Techniques, Tools, and Applications (ECAI Tutorial 2024)
Data Clustering using Expectation Maximization algorithm. To cite this Original Software Publication: https://www.sciencedirect.com/science/article/pii/S2352711021001771
Implementation of data typology for imbalanced datasets.
Add a description, image, and links to the data-centric-machine-learning topic page so that developers can more easily learn about it.
To associate your repository with the data-centric-machine-learning topic, visit your repo's landing page and select "manage topics."