Skip to content

Support microbenchmarking for low precision training #2101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: memory_profiler
Choose a base branch
from

Conversation

jainapurva
Copy link
Contributor

This pull request introduces significant enhancements to the microbenchmarking framework, including support for training benchmarks, expanded configuration options, and improved profiling capabilities. The changes enhance flexibility, usability, and the ability to benchmark various scenarios, particularly for training with low-precision data types like float8.

Enhancements to Benchmarking Framework

Training Benchmarks

  • Added support for training benchmarks, including forward and backward pass performance, float8-specific configurations, and profiling capabilities. This includes a new TrainingBenchmarkConfig class and associated logic in benchmark_runner.py to load and execute training benchmarks.
  • Introduced YAML-based configuration for training benchmarks with options for quantization, scaling types, granularity, and matrix shapes.

Expanded Configuration Options

  • Added new shape generation methods (llama, pow2, pow2_extended, sweep) for benchmarking different matrix dimensions. These methods enable more comprehensive testing of model performance across a variety of scenarios.
  • Introduced profiling options (enable_profiler, enable_memory_profiler) to enable both standard and CUDA memory profiling, with outputs saved for further analysis.

Code Refactoring and Improvements

Profiling Enhancements

  • Updated benchmark_inference.py to integrate memory profiling (generate_memory_profile, visualize_memory_profile) and improve error handling for profiling-related operations.
  • Replaced the create_model_and_input function with create_model_and_input_data to improve clarity and maintainability.

Modularization

  • Refactored benchmark_runner.py to separate training and inference benchmark logic, ensuring better readability and support for future extensions.

These changes collectively make the benchmarking framework more robust, extensible, and capable of handling diverse use cases.

@jainapurva jainapurva added topic: for developers Use this tag if this PR is mainly developer facing topic: performance Use this tag if this PR improves the performance of a feature labels Apr 22, 2025
@jainapurva jainapurva requested a review from Copilot April 22, 2025 19:21
Copy link

pytorch-bot bot commented Apr 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2101

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6970052 with merge base 25034e5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 22, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the microbenchmarking framework by introducing training benchmarks, new configuration options for low-precision training, and extended profiling support including CUDA memory profiling. Key changes include:

  • Addition of a new TrainingBenchmarkConfig class and associated result processing for training benchmarks.
  • Updates to model creation and test files to support LNLinearActivationModel and new shape generation methods.
  • Expansion of benchmarking utilities and runner scripts to handle training-specific parameters and profiling.

Reviewed Changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated no comments.

Show a summary per file
File Description
torchao/testing/model_architectures.py Introduces LNLinearActivationModel and updates model creation logic.
benchmarks/microbenchmarks/utils.py Adds memory profiling parameters and extends result dictionaries.
test/test_model_architecture.py Updates test cases to use LNLinearActivationModel instead of LNLinearSigmoid.
benchmarks/microbenchmarks/test/* Adds tests for training benchmarks and verifies TOPS metrics.
benchmarks/microbenchmarks/benchmark_runner.py Adds support for training mode and new shape generators.
README.md Documents training benchmarks and updated configuration options.
Files not reviewed (1)
  • benchmarks/microbenchmarks/test/benchmark_config.yml: Language not supported
Comments suppressed due to low confidence (2)

test/test_model_architecture.py:160

  • [nitpick] The test method name and inline documentation still reference 'LNLinearSigmoid' despite using LNLinearActivationModel; consider updating the test case name and associated comments to reflect the new model naming.
model = LNLinearActivationModel(fc_dim1=64, fc_dim2=32, dtype=torch.float32)

benchmarks/microbenchmarks/benchmark_runner.py:161

  • [nitpick] The return type 'List[Any]' could be more precisely typed as Union[BenchmarkConfig, TrainingBenchmarkConfig] to improve type safety and maintainability.
def load_benchmark_configs(cli_args: argparse.Namespace) -> List[Any]:

@jainapurva jainapurva requested a review from HDCharles April 22, 2025 19:33
@jainapurva jainapurva changed the base branch from main to memory_profiler April 22, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: for developers Use this tag if this PR is mainly developer facing topic: performance Use this tag if this PR improves the performance of a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants