-
Notifications
You must be signed in to change notification settings - Fork 456
[Feature] Add Support on Log Probability Value from Returned Response of Gemini Models. #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Any thoughts about it? |
I also think it would be extremely helpful if the API could provide the top-k log probabilities of each predicted token. |
Really need this as I found Gemini to be hallucinating quite abit on a RAG application. Btw your title refers to |
Thanks for reminding 🤗 |
Many hallucination detection approaches rely on log probability as a key feature. It's one of the most essential elements when building a serious product with a LLM. |
This would be extremely helpful! |
As all others previously mentioned, it would be great to somehow get access to the logsprobs in a similar fashion as OpenAI does so with the models. Based on that we could then e.g. calculate the perplexity score and various other evaluation metrics. |
Yep, this would make it possible to use gemini in production |
+1 - useful also for classification tasks. |
+1 |
+1 it is critical. Chatgpt and other major models are now all supporting this feature. |
Can you please specify what major models support this feature? Because I'm also in search for the alternatives. Now I don't see any other players supporting this except from OpenAI. No Anthropic, No Mistral... |
b/361194489 |
Yes, this would allow evaluating Gemini with threshold-free evaluation metrics. That would be excellent. |
Google's Vertex-AI Just launched this, hopefully that means it's coming soon here, but I don't have a timeline. |
can you share a link? |
They have added a field 'avgLogprobs' to the response documentation in https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#nodejs but I am unable to get a response with such field. |
Yeah, this API is related, but separate from vertex. Hopefully this API will catch up soon. |
|
This is fixed in the latest version: code: #561 |
Just to make sure I get this right. The logprobs are averaged for the entire output, correct? |
I recommend you open another issue to discuss this question =) |
Created a convo here actually :) |
The feature is available in only Would it be possible to extend to 1.5-pro and 2.0 models? |
Thanks for the note @miaojingang! Unfortunately, my experience is a bit different. TLDR: I did some digging. But here's very likely confimation that logprobs does not work for Gemini at the moment: def test_log_probs(client):
# ML DEV discovery doc supports response_logprobs but the backend
# does not.
# TODO: update replay test json files when ML Dev backend is updated.
with pytest_helper.exception_if_mldev(client, errors.ClientError):
client.models.generate_content(
model='gemini-1.5-flash',
contents='What is your name?',
config={
'logprobs': 2,
'presence_penalty': 0.5,
'frequency_penalty': 0.5,
'response_logprobs': True,
},
) Perhaps, it is in the works. Long Answer So there's some ambiguity in how this is being handled. To my knowledge, there was full support for logprobs just a few short months ago on Gemini's
genConfig = types.GenerateContentConfig()
print(dir(genConfig)) yields:
The latter has response = client.models.generate_content(
model="gemini-2.0-flash-lite",
contents="Tell me how the internet works, but pretend I'm a puppy who only understands squeaky toys.",
config=types.GenerateContentConfig(
temperature=0.4,
top_p=0.95,
top_k=20,
candidate_count=1,
seed=5,
max_output_tokens=100,
stop_sequences=["STOP!"],
presence_penalty=0.0,
frequency_penalty=0.0,
response_logprobs=True
)
)
print(response.text) yields
And the same for 1.5-flash as well. Not sure what else to try. @miaojingang, can you share how you are getting the log prob feature even from the models you listed? Versions and example code of what you are doing would be appreciated as it's challenging to keep track of which version of the Gemini api this applies to. I've tried a few versions of the apis but can't be certain that a different version won't work better. Edit: I didn't spend too much time parsing it, but this commit seems very relevant. It looks like the devs tried to add logprobs back, but didn't quite get all the way? I'll have to look more deeply through it: googleapis/python-genai@5586f3d |
Thanks @chancharikmitra for the detailed notes. I'm new to all those different implementation and find it confusing. I saw the |
@miaojingang how did they make it work? |
I was able to get this to work with
, but only for according to the documentation, it currently only works for gemini-2.0-flash-lite-001 and gemini-1.5-flash: |
It stopped working for |
Yes, agreed @tranhoangnguyen03. This seems to be a widespread source of confusion for those who care about this feature, have used it some point, and now are suddenly seeing it missing despite some remenants of it being on the Gemini documentation: EDIT: The code below works if you use the Vertex AI API, NOT THE GOOGLE GEMINI API. My error was in confusing the two as I was sure their functionality would be the same. This is a response I posted in another thread vercel/ai#5418 (comment).
|
+1 here. I am also not getting the logprobs for the Preview models. I am using the
|
It works on Vertex, 1 request per day, resets 12am Pacific. |
Description of the feature request:
How about providing new support for retrieving the log probability of each predicted token by Google models like Gemini?
Something just like the same function illustrated in this OpenAI post: User Logprobs.
What problem are you trying to solve with this feature?
Getting the returned log prob of each generated token does help me (and other users maybe) to confirm the confidence of the model's prediction. Further, this feature can help users compute the perplexity of generated sentences to better understand the textual continuation quality.
Any other information you'd like to share?
No response
The text was updated successfully, but these errors were encountered: