The strategies for Reliable Long-form Transcription
in whisper.cpp differs from OpenAI's Whisper
#1461
Replies: 3 comments 4 replies
-
The reason for the temp increase to be 0.4 is that it is faster processing when the fallback triggers. After we add efficient batched decoding, we will reduce it to 0.2. We don't use |
Beta Was this translation helpful? Give feedback.
-
Hey @ggerganov and @Artoria2e5 just coming across this thread. Not sure what the current thinking is / timeline of implementation? |
Beta Was this translation helpful? Give feedback.
-
Can it be shown, in whisper.cpp output, that this error also leads to erroneous results, or is it noise below threshold? |
Beta Was this translation helpful? Give feedback.
-
I suddenly wanted to take a closer look at the OpenAI Whisper paper, and one section that caught my attention is the one I highlighted in yellow. Then I checked the whisper.cpp code and found that there are mainly two issues: the size of the temperature increment, and the method of calculating the compression ratio. The temperature in whisper.cpp increases by
0.4
each time instead of the0.2
mentioned in the paper. Additionally, whisper.cpp usesentropy
as a substitute for thegzip compression ratio
, while OpenAI Whisper actually compresses the text and calculates the realgzip compression ratio
. @ggerganovTemprature:
whisper/transcribe.py
https://github.com/ggerganov/whisper.cpp/blob/0de8582f65bde052b08b1844c39fed8808f75619/whisper.cpp#L3833
https://github.com/ggerganov/whisper.cpp/blob/0de8582f65bde052b08b1844c39fed8808f75619/whisper.cpp#L4545-L4554
Gzip Compression Ratio:
whisper/utils.py
https://github.com/ggerganov/whisper.cpp/blob/0de8582f65bde052b08b1844c39fed8808f75619/whisper.cpp#L4326-L4372
Beta Was this translation helpful? Give feedback.
All reactions