Support microbenchmarking for low precision training #2101

jainapurva · 2025-04-22T19:21:37Z

This pull request introduces significant enhancements to the microbenchmarking framework, including support for training benchmarks, expanded configuration options, and improved profiling capabilities. The changes enhance flexibility, usability, and the ability to benchmark various scenarios, particularly for training with low-precision data types like float8.

Enhancements to Benchmarking Framework

Training Benchmarks

Added support for training benchmarks, including forward and backward pass performance, float8-specific configurations, and profiling capabilities. This includes a new TrainingBenchmarkConfig class and associated logic in benchmark_runner.py to load and execute training benchmarks.
Introduced YAML-based configuration for training benchmarks with options for quantization, scaling types, granularity, and matrix shapes.

Expanded Configuration Options

Added new shape generation methods (llama, pow2, pow2_extended, sweep) for benchmarking different matrix dimensions. These methods enable more comprehensive testing of model performance across a variety of scenarios.
Introduced profiling options (enable_profiler, enable_memory_profiler) to enable both standard and CUDA memory profiling, with outputs saved for further analysis.

Code Refactoring and Improvements

Profiling Enhancements

Updated benchmark_inference.py to integrate memory profiling (generate_memory_profile, visualize_memory_profile) and improve error handling for profiling-related operations.
Replaced the create_model_and_input function with create_model_and_input_data to improve clarity and maintainability.

Modularization

Refactored benchmark_runner.py to separate training and inference benchmark logic, ensuring better readability and support for future extensions.

These changes collectively make the benchmarking framework more robust, extensible, and capable of handling diverse use cases.

pytorch-bot · 2025-04-22T19:21:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2101

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6970052 with merge base 25034e5 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copilot

Pull Request Overview

This PR enhances the microbenchmarking framework by introducing training benchmarks, new configuration options for low-precision training, and extended profiling support including CUDA memory profiling. Key changes include:

Addition of a new TrainingBenchmarkConfig class and associated result processing for training benchmarks.
Updates to model creation and test files to support LNLinearActivationModel and new shape generation methods.
Expansion of benchmarking utilities and runner scripts to handle training-specific parameters and profiling.

Reviewed Changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
torchao/testing/model_architectures.py	Introduces LNLinearActivationModel and updates model creation logic.
benchmarks/microbenchmarks/utils.py	Adds memory profiling parameters and extends result dictionaries.
test/test_model_architecture.py	Updates test cases to use LNLinearActivationModel instead of LNLinearSigmoid.
benchmarks/microbenchmarks/test/*	Adds tests for training benchmarks and verifies TOPS metrics.
benchmarks/microbenchmarks/benchmark_runner.py	Adds support for training mode and new shape generators.
README.md	Documents training benchmarks and updated configuration options.

Files not reviewed (1)

benchmarks/microbenchmarks/test/benchmark_config.yml: Language not supported

Comments suppressed due to low confidence (2)

test/test_model_architecture.py:160

[nitpick] The test method name and inline documentation still reference 'LNLinearSigmoid' despite using LNLinearActivationModel; consider updating the test case name and associated comments to reflect the new model naming.

model = LNLinearActivationModel(fc_dim1=64, fc_dim2=32, dtype=torch.float32)

benchmarks/microbenchmarks/benchmark_runner.py:161

[nitpick] The return type 'List[Any]' could be more precisely typed as Union[BenchmarkConfig, TrainingBenchmarkConfig] to improve type safety and maintainability.

def load_benchmark_configs(cli_args: argparse.Namespace) -> List[Any]:

jainapurva added 6 commits April 14, 2025 10:01

Updates to memory_profiler

f0709a8

Merge remote-tracking branch 'origin/memory_profiler' into training_fbp

92b1e3b

Add training benchmarking

1cff42d

Use baseline bf16

abc9ef5

Add TOPS calculation

44f564d

Simplified the logic for linear

f12ece0

jainapurva added topic: for developers Use this tag if this PR is mainly developer facing topic: performance Use this tag if this PR improves the performance of a feature labels Apr 22, 2025

jainapurva requested a review from Copilot April 22, 2025 19:21

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 22, 2025

Copilot AI reviewed Apr 22, 2025

View reviewed changes

jainapurva requested a review from HDCharles April 22, 2025 19:33

jainapurva changed the base branch from main to memory_profiler April 22, 2025 20:04

Calculating ref with every run

6970052

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support microbenchmarking for low precision training #2101

Support microbenchmarking for low precision training #2101

jainapurva commented Apr 22, 2025

pytorch-bot bot commented Apr 22, 2025 •

edited

Loading

Copilot AI left a comment

Support microbenchmarking for low precision training #2101

Are you sure you want to change the base?

Support microbenchmarking for low precision training #2101

Conversation

jainapurva commented Apr 22, 2025

Enhancements to Benchmarking Framework

Training Benchmarks

Expanded Configuration Options

Code Refactoring and Improvements

Profiling Enhancements

Modularization

pytorch-bot bot commented Apr 22, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2101

✅ No Failures

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

pytorch-bot bot commented Apr 22, 2025 •

edited

Loading