Releases: KernelTuner/kernel_tuner
Version 1.1.3
This release contains a number of small bugfixes and enables support on Nvidia Blackwell GPUs.
What's Changed
- Resolve deprecation warnings of regex library by @emmanuel-ferdman in #296
- Support three-digit compute capability by @csbnw in #299
- Add support for half and bfloat16 scalars in pyCUDA backend by @stijnh in #300
- Fix issue #245 by @stijnh in #302
New Contributors
- @emmanuel-ferdman made their first contribution in #296
Full Changelog: 1.1.2...1.1.3
Version 1.1.2
This release would not have been necessary if I had not forgotten to increment the version number on the previous release that I made 20 minutes ago. Alas, we all make mistakes sometimes.
Version 1.1.1
The sole purpose of this release is to support Numpy 2.0 and newer. The main motivation for this is to make the examples and tutorial notebooks working again on Google Colab.
What's Changed
- Numpy2 support by @benvanwerkhoven in #295
Full Changelog: 1.1.0...1.1.1
Version 1.1.0
This release integrates many smaller changes that have been made over the past year.
The most significant new features are:
- The NCUObserver to include performance metrics from the Nvidia Profiler during tuning
- TegraObserver to read/set clock frequencies, power and temperature on Nvidia Jetson GPUs
In addition, a lot of work has been put into several backends, including OpenACC, the compiler backend, the HIP backend and so on.
Thanks to everyone who contributed to Kernel Tuner in the past year!
What's Changed
- Add Tegra Observer to control clocks on Jetson devices by @loostrum in #243
- Catch RuntimeError when importing from pyhip by @loostrum in #252
- Bump pillow from 10.2.0 to 10.3.0 by @dependabot in #249
- Read instant power in pwr_usage by @csbnw in #247
- Bump idna from 3.6 to 3.7 by @dependabot in #250
- Register observer & correct clock setting by @fjwillemsen in #242
- Compiler backend uses g++ instead of gcc by @benvanwerkhoven in #254
- Improved OpenACC support by @isazi in #248
- Small improvements to searchspaces and simulation mode by @fjwillemsen in #251
- Simplify contributing info by @benvanwerkhoven in #255
- Support Python 3.12 and drop Python 3.8 by @benvanwerkhoven in #256
- Support Python 3.12 and drop Python 3.8 (2) by @fjwillemsen in #260
- Add NCUObserver by @csbnw in #253
- Update PMTObserver for latest PMT changes by @csbnw in #261
- OpenACC bug fixing by @isazi in #262
- ESiWACE3 hackathon by @isazi in #267
- fix reading of graphics and memory clocks by @benvanwerkhoven in #271
- Directives: summer refactoring by @isazi in #269
- Tegra observer by @MartijnFr in #270
- Tegra observer with continuous observer by @benvanwerkhoven in #275
- base implementation for pmt continuous observer by @benvanwerkhoven in #276
- Add support for float16 to HIP backend by @loostrum in #280
- Fix: out-of-date PMTContinuousObserver readings by @wvbbreu in #283
- Hip local memory error handling by @MiloLurati in #284
- Replacing PyHIP with new official python wrapper of ROCm HIP by @MiloLurati in #285
- update observer to latest python bindings by @benvanwerkhoven in #279
- add support for any case spelling of block size name defaults by @benvanwerkhoven in #277
- update documentation by @benvanwerkhoven in #293
- Updated pyproject to use hip-python from testpypi by @fjwillemsen in #294
New Contributors
- @MartijnFr made their first contribution in #270
- @wvbbreu made their first contribution in #283
Full Changelog: 1.0...1.1.0
Version 1.0
Finally, the Version 1.0 release is here! The software has been stable and ready for production use for quite some time now and after being in beta for about a half a year, we are confident that the current version of the software deserves to mark the first major release of Kernel Tuner.
Version 1.0 integrates a lot of new functionality, including blazing fast search space construction, support for tuning HIP kernels on AMD GPUs, new functionality for mixed precision and accuracy tuning, experimental support for tuning OpenACC programs, a conda package installer for Kernel Tuner, and many more changes and additions.
I would like to thank every one involved in the development of Kernel Tuner of the past years! Special thanks to the Kernel Tuner developers team for their continued support of the project!
From the Changelog
- HIP backend to support tuning HIP kernels on AMD GPUs
- Experimental features for mixed-precision and accuracy tuning
- Experimental features for OpenACC tuning
- Major speedup due to new parser and using revamped python-constraint for searchspace building
- Implemented ability to use
PySMT
andATF
for searchspace building - Added Poetry for dependency and build management
- Switched from
setup.py
andsetup.cfg
topyproject.toml
for centralized metadata, added relevant tests - Updated GitHub Action workflows to use Poetry
- Updated dependencies, most notably NumPy is no longer version-locked as scikit-opt is no longer a dependency
- Documentation now uses
pyproject.toml
metadata, minor fixes and changes to be compatible with updated dependencies - Set up Nox for testing on all supported Python versions in isolated environments
- Added linting information, VS Code settings and recommendations
- Discontinued use of
OrderedDict
, as all dictionaries in the Python versions used are already ordered - Dropped Python 3.7 support
Merged Pull Requests
- HIP Backend by @MiloLurati in #199
- Accuracy tuning by @stijnh in #189
- Fix issue where HIP backend fails due to invalid arguments type by @stijnh in #216
- Searchspace improvements and project meta modernization by @fjwillemsen in #214
- Minor bugfix by @isazi in #219
- OpenACC support by @isazi in #197
- Fixed broken tests as per issue #217 by @fjwillemsen in #220
- Fix snap_to_nearest on non-numeric parameters by @stijnh in #221
- expand documentation on backends by @benvanwerkhoven in #213
- Add support for passing cupy arrays to "C" lang by @bouweandela in #226
- improve code quality of cache file related functions by @benvanwerkhoven in #240
- New readme by @benvanwerkhoven in #231
New Contributors
- @MiloLurati made their first contribution in #199
- @dependabot made their first contribution in #222
- @bouweandela made their first contribution in #226
Full Changelog: 0.4.5...1.0
Version 1.0.0b6
This is a beta release for early access to the new features. Not intended for production use.
The release contains:
- Inclusion of tests in the source package, as requested in #225
- Updated dependencies
Version 1.0.0b5
This is a beta release for early access to the new features. Not intended for production use.
The release contains:
- Expanded documentation on backends by @benvanwerkhoven in #213
- A fix for an issue that could cause incorrect conversion to Constraint
- Extended tests to detect this
- Bump urllib3 from 2.0.6 to 2.0.7 by @dependabot in #222
- Updated dependencies
Full Changelog: 1.0.0b4...1.0.0b5
Version 1.0.0b4
This is a beta release for early access to the new features. Not intended for production use.
This release contains several improvements:
nvidia-ml-py
added totutorial
extra dependencies.- Additional checks for coherent Poetry configuration and warning in case of outdated development environment.
- Updated dependencies.
Version 1.0.0b3
This is a beta release for early access to the new features. Not intended for production use.
This version contains several bugfixes:
- Fix snap_to_nearest on non-numeric parameters by @stijnh in #221
- Fixed an issue where some restrictions would not be recognized by the old
check_restrictions
function. - Fixed an issue where
bayes_opt
would not handle pruned parameters correctly.
Full Changelog: 1.0.0b2...1.0.0b3
Version 1.0.0b2
This is a beta release for early access to the new features. Not intended for production use.
Full Changelog: 1.0.0b1...1.0.0b2