Description
Description of the bug:
Greetings,
While experimenting with the GenerationConfig parameters in the Gemini Pro model, I've noticed an unexpected variability in the outputs generated with the top_k=1 setting, which contrasts with the near-consistent responses observed with top_p=0.
Detailed Explanation:
Testing revealed that while top_p=0 leads to near-consistent outputs for the same input—which is expected due to its nature of narrowing down the generation to the most probable outcomes—the top_k=1 setting does not exhibit the same level of consistency. This observation is intriguing given that top_k=1 theoretically limits the response generation to the top k probable outcomes, which should similarly result in near-consistent outputs for identical requests.
Below is the Python code snippet illustrating the tests conducted and highlighting the difference in response consistency:
import google.generativeai as genai
model = genai.GenerativeModel(model_name='gemini-pro')
user_message = "Write a one-sentence poem"
# Testing for near-consistent response with top_p=0
print("\ntop_p=0:")
generation_config = genai.GenerationConfig(top_p=0)
for _ in range(3):
response = model.generate_content(user_message, generation_config=generation_config)
print(f'{"_"*20}\n{response.text}')
# Testing for variability with top_k=1
print("\ntop_k=1:")
generation_config = genai.GenerationConfig(top_k=1)
for _ in range(3):
response = model.generate_content(user_message, generation_config=generation_config)
print(f'{"_"*20}\n{response.text}')
output:
top_p=0:
____________________
In the vast expanse, a star whispers its tale.
____________________
In the vast expanse, a star whispers its tale.
____________________
In the vast expanse, a star whispers its tale.
top_k=1:
____________________
In a world of colors, a heart beats, a story unfolds.
____________________
In cosmic expanse, a flicker of light, a tale untold.
____________________
In twilight's embrace, dreams whisper of distant stars.
Actual vs expected behavior:
Expected Behavior:
It is anticipated that top_k=1 would result in near-consistent responses for the same input, similar to the behavior observed with top_p=0.
Actual Behavior:
The top_k=1 parameter exhibits significant variability in responses for identical inputs, contrary to the near-consistency expected.
Any other information you'd like to share?
No response