-
Notifications
You must be signed in to change notification settings - Fork 247
Support microbenchmarking for low precision training #2101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: memory_profiler
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2101
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6970052 with merge base 25034e5 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the microbenchmarking framework by introducing training benchmarks, new configuration options for low-precision training, and extended profiling support including CUDA memory profiling. Key changes include:
- Addition of a new TrainingBenchmarkConfig class and associated result processing for training benchmarks.
- Updates to model creation and test files to support LNLinearActivationModel and new shape generation methods.
- Expansion of benchmarking utilities and runner scripts to handle training-specific parameters and profiling.
Reviewed Changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
torchao/testing/model_architectures.py | Introduces LNLinearActivationModel and updates model creation logic. |
benchmarks/microbenchmarks/utils.py | Adds memory profiling parameters and extends result dictionaries. |
test/test_model_architecture.py | Updates test cases to use LNLinearActivationModel instead of LNLinearSigmoid. |
benchmarks/microbenchmarks/test/* | Adds tests for training benchmarks and verifies TOPS metrics. |
benchmarks/microbenchmarks/benchmark_runner.py | Adds support for training mode and new shape generators. |
README.md | Documents training benchmarks and updated configuration options. |
Files not reviewed (1)
- benchmarks/microbenchmarks/test/benchmark_config.yml: Language not supported
Comments suppressed due to low confidence (2)
test/test_model_architecture.py:160
- [nitpick] The test method name and inline documentation still reference 'LNLinearSigmoid' despite using LNLinearActivationModel; consider updating the test case name and associated comments to reflect the new model naming.
model = LNLinearActivationModel(fc_dim1=64, fc_dim2=32, dtype=torch.float32)
benchmarks/microbenchmarks/benchmark_runner.py:161
- [nitpick] The return type 'List[Any]' could be more precisely typed as Union[BenchmarkConfig, TrainingBenchmarkConfig] to improve type safety and maintainability.
def load_benchmark_configs(cli_args: argparse.Namespace) -> List[Any]:
This pull request introduces significant enhancements to the microbenchmarking framework, including support for training benchmarks, expanded configuration options, and improved profiling capabilities. The changes enhance flexibility, usability, and the ability to benchmark various scenarios, particularly for training with low-precision data types like float8.
Enhancements to Benchmarking Framework
Training Benchmarks
TrainingBenchmarkConfig
class and associated logic inbenchmark_runner.py
to load and execute training benchmarks.Expanded Configuration Options
llama
,pow2
,pow2_extended
,sweep
) for benchmarking different matrix dimensions. These methods enable more comprehensive testing of model performance across a variety of scenarios.enable_profiler
,enable_memory_profiler
) to enable both standard and CUDA memory profiling, with outputs saved for further analysis.Code Refactoring and Improvements
Profiling Enhancements
benchmark_inference.py
to integrate memory profiling (generate_memory_profile
,visualize_memory_profile
) and improve error handling for profiling-related operations.create_model_and_input
function withcreate_model_and_input_data
to improve clarity and maintainability.Modularization
benchmark_runner.py
to separate training and inference benchmark logic, ensuring better readability and support for future extensions.These changes collectively make the benchmarking framework more robust, extensible, and capable of handling diverse use cases.