Skip to content
#

dataset-generator

Here are 49 public repositories matching this topic...

This repository hosts a comprehensive suite for graph-based entity summarization dataset generating from user-selected Wikipedia pages. Utilizing a series of interconnected modules, it leverages Wikidata and Wikipedia dumps to construct a dataset, alongside auto-generated ground truths.

  • Updated Jun 24, 2024
  • Python
ImageFromTextGenerator

IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.

  • Updated Apr 3, 2025
  • Python

Sudoku4LLM is a Sudoku dataset generator for training and evaluating reasoning in Large Language Models (LLMs). It offers customizable puzzles, difficulty levels, and 11 serialization formats to support structured data reasoning and Chain of Thought (CoT) experiments.

  • Updated Apr 28, 2025
  • Python

Improve this page

Add a description, image, and links to the dataset-generator topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataset-generator topic, visit your repo's landing page and select "manage topics."

Learn more