Releases · KernelTuner/kernel_tuner

21 May 12:21

benvanwerkhoven

1.1.3

d8b4f9a

Version 1.1.3 Latest

Latest

This release contains a number of small bugfixes and enables support on Nvidia Blackwell GPUs.

What's Changed

Resolve deprecation warnings of regex library by @emmanuel-ferdman in #296
Support three-digit compute capability by @csbnw in #299
Add support for half and bfloat16 scalars in pyCUDA backend by @stijnh in #300
Fix issue #245 by @stijnh in #302

New Contributors

@emmanuel-ferdman made their first contribution in #296

Full Changelog: 1.1.2...1.1.3

Contributors

stijnh, csbnw, and emmanuel-ferdman

Assets 2

08 Apr 08:48

benvanwerkhoven

1.1.2

ea29129

Version 1.1.2

This release would not have been necessary if I had not forgotten to increment the version number on the previous release that I made 20 minutes ago. Alas, we all make mistakes sometimes.

Assets 2

08 Apr 08:25

benvanwerkhoven

1.1.1

ece0719

Version 1.1.1

The sole purpose of this release is to support Numpy 2.0 and newer. The main motivation for this is to make the examples and tutorial notebooks working again on Google Colab.

What's Changed

Numpy2 support by @benvanwerkhoven in #295

Full Changelog: 1.1.0...1.1.1

Contributors

benvanwerkhoven

Assets 2

04 Apr 10:10

benvanwerkhoven

1.1.0

85da990

Version 1.1.0

This release integrates many smaller changes that have been made over the past year.

The most significant new features are:

The NCUObserver to include performance metrics from the Nvidia Profiler during tuning
TegraObserver to read/set clock frequencies, power and temperature on Nvidia Jetson GPUs

In addition, a lot of work has been put into several backends, including OpenACC, the compiler backend, the HIP backend and so on.

Thanks to everyone who contributed to Kernel Tuner in the past year!

What's Changed

Add Tegra Observer to control clocks on Jetson devices by @loostrum in #243
Catch RuntimeError when importing from pyhip by @loostrum in #252
Bump pillow from 10.2.0 to 10.3.0 by @dependabot in #249
Read instant power in pwr_usage by @csbnw in #247
Bump idna from 3.6 to 3.7 by @dependabot in #250
Register observer & correct clock setting by @fjwillemsen in #242
Compiler backend uses g++ instead of gcc by @benvanwerkhoven in #254
Improved OpenACC support by @isazi in #248
Small improvements to searchspaces and simulation mode by @fjwillemsen in #251
Simplify contributing info by @benvanwerkhoven in #255
Support Python 3.12 and drop Python 3.8 by @benvanwerkhoven in #256
Support Python 3.12 and drop Python 3.8 (2) by @fjwillemsen in #260
Add NCUObserver by @csbnw in #253
Update PMTObserver for latest PMT changes by @csbnw in #261
OpenACC bug fixing by @isazi in #262
ESiWACE3 hackathon by @isazi in #267
fix reading of graphics and memory clocks by @benvanwerkhoven in #271
Directives: summer refactoring by @isazi in #269
Tegra observer by @MartijnFr in #270
Tegra observer with continuous observer by @benvanwerkhoven in #275
base implementation for pmt continuous observer by @benvanwerkhoven in #276
Add support for float16 to HIP backend by @loostrum in #280
Fix: out-of-date PMTContinuousObserver readings by @wvbbreu in #283
Hip local memory error handling by @MiloLurati in #284
Replacing PyHIP with new official python wrapper of ROCm HIP by @MiloLurati in #285
update observer to latest python bindings by @benvanwerkhoven in #279
add support for any case spelling of block size name defaults by @benvanwerkhoven in #277
update documentation by @benvanwerkhoven in #293
Updated pyproject to use hip-python from testpypi by @fjwillemsen in #294

New Contributors

@MartijnFr made their first contribution in #270
@wvbbreu made their first contribution in #283

Full Changelog: 1.0...1.1.0

Contributors

isazi, benvanwerkhoven, and 7 other contributors

Assets 2

04 Apr 20:03

benvanwerkhoven

1.0

1c96693

Version 1.0

Finally, the Version 1.0 release is here! The software has been stable and ready for production use for quite some time now and after being in beta for about a half a year, we are confident that the current version of the software deserves to mark the first major release of Kernel Tuner.

Version 1.0 integrates a lot of new functionality, including blazing fast search space construction, support for tuning HIP kernels on AMD GPUs, new functionality for mixed precision and accuracy tuning, experimental support for tuning OpenACC programs, a conda package installer for Kernel Tuner, and many more changes and additions.

I would like to thank every one involved in the development of Kernel Tuner of the past years! Special thanks to the Kernel Tuner developers team for their continued support of the project!

From the Changelog

HIP backend to support tuning HIP kernels on AMD GPUs
Experimental features for mixed-precision and accuracy tuning
Experimental features for OpenACC tuning
Major speedup due to new parser and using revamped python-constraint for searchspace building
Implemented ability to use PySMT and ATF for searchspace building
Added Poetry for dependency and build management
Switched from setup.py and setup.cfg to pyproject.toml for centralized metadata, added relevant tests
Updated GitHub Action workflows to use Poetry
Updated dependencies, most notably NumPy is no longer version-locked as scikit-opt is no longer a dependency
Documentation now uses pyproject.toml metadata, minor fixes and changes to be compatible with updated dependencies
Set up Nox for testing on all supported Python versions in isolated environments
Added linting information, VS Code settings and recommendations
Discontinued use of OrderedDict, as all dictionaries in the Python versions used are already ordered
Dropped Python 3.7 support

Merged Pull Requests

HIP Backend by @MiloLurati in #199
Accuracy tuning by @stijnh in #189
Fix issue where HIP backend fails due to invalid arguments type by @stijnh in #216
Searchspace improvements and project meta modernization by @fjwillemsen in #214
Minor bugfix by @isazi in #219
OpenACC support by @isazi in #197
Fixed broken tests as per issue #217 by @fjwillemsen in #220
Fix snap_to_nearest on non-numeric parameters by @stijnh in #221
expand documentation on backends by @benvanwerkhoven in #213
Add support for passing cupy arrays to "C" lang by @bouweandela in #226
improve code quality of cache file related functions by @benvanwerkhoven in #240
New readme by @benvanwerkhoven in #231

New Contributors

@MiloLurati made their first contribution in #199
@dependabot made their first contribution in #222
@bouweandela made their first contribution in #226

Full Changelog: 0.4.5...1.0

Contributors

isazi, stijnh, and 5 other contributors

Assets 2

07 Dec 08:19

fjwillemsen

1.0.0b6

66428e3

Version 1.0.0b6 Pre-release

Pre-release

This is a beta release for early access to the new features. Not intended for production use.

The release contains:

Inclusion of tests in the source package, as requested in #225
Updated dependencies

Assets 2

01 Nov 14:11

fjwillemsen

1.0.0b5

08fb58e

Version 1.0.0b5 Pre-release

Pre-release

This is a beta release for early access to the new features. Not intended for production use.

The release contains:

Expanded documentation on backends by @benvanwerkhoven in #213
A fix for an issue that could cause incorrect conversion to Constraint
Extended tests to detect this
Bump urllib3 from 2.0.6 to 2.0.7 by @dependabot in #222
Updated dependencies

Full Changelog: 1.0.0b4...1.0.0b5

Contributors

benvanwerkhoven and dependabot

Assets 2

22 Oct 14:11

fjwillemsen

1.0.0b4

d36a5eb

Version 1.0.0b4 Pre-release

Pre-release

This is a beta release for early access to the new features. Not intended for production use.

This release contains several improvements:

nvidia-ml-py added to tutorial extra dependencies.
Additional checks for coherent Poetry configuration and warning in case of outdated development environment.
Updated dependencies.

Assets 2

12 Oct 13:02

fjwillemsen

1.0.0b3

e980b23

Version 1.0.0b3 Pre-release

Pre-release

This is a beta release for early access to the new features. Not intended for production use.

This version contains several bugfixes:

Fix snap_to_nearest on non-numeric parameters by @stijnh in #221
Fixed an issue where some restrictions would not be recognized by the old check_restrictions function.
Fixed an issue where bayes_opt would not handle pruned parameters correctly.

Full Changelog: 1.0.0b2...1.0.0b3

Contributors

stijnh

Assets 2

11 Oct 16:37

fjwillemsen

1.0.0b2

0e009fd

Version 1.0.0b2 Pre-release

Pre-release

This is a beta release for early access to the new features. Not intended for production use.

Full Changelog: 1.0.0b1...1.0.0b2

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

From the Changelog

Merged Pull Requests

New Contributors

Contributors

Uh oh!

Uh oh!

Contributors

Uh oh!

Uh oh!

Contributors

Uh oh!

Uh oh!

Releases: KernelTuner/kernel_tuner

Version 1.1.3

What's Changed

New Contributors

Contributors

Uh oh!

Version 1.1.2

Uh oh!

Version 1.1.1

What's Changed

Contributors

Uh oh!

Version 1.1.0

What's Changed

New Contributors

Contributors

Uh oh!

Version 1.0

From the Changelog

Merged Pull Requests

New Contributors

Contributors

Uh oh!

Version 1.0.0b6

Uh oh!

Version 1.0.0b5

Contributors

Uh oh!

Version 1.0.0b4

Uh oh!

Version 1.0.0b3

Contributors

Uh oh!

Version 1.0.0b2

Uh oh!