-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: options and ChatCompletionRequest add property enable_thinking #2940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… enable_thinking is used to control whether the Qwen3 model enables the thinking mode. Signed-off-by: xuanmiss <xuanmine@gmail.com>
5b67970
to
0681646
Compare
/** | ||
* Whether to enable the thinking mode | ||
*/ | ||
private @JsonProperty("enable_thinking") Boolean enableThinking; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to do with these differences emerging, in particular in the reasoning models. This option is not part of openai.
Maybe we can have a subclass of OpenAiChatOptions such as QwenAiChatOptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about utilizing something like the template pattern? Apart from the openai compatible apis, in general, most of the models just have a few differences on request and response objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apappascs can you elaborate more please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apappascs can you elaborate more please?
Thank you for the contribution @xuanmiss . Could you please add some integration tests ? Given the documentation it's not so clear that this is the correct structure https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes.
As a temporary solution, you can add |
This does seem a bit complicated. Although various model providers or deployment inference model services like vllm and SGLang are compatible with OpenAI's API format and protocol, there might still be some parameter differences depending on the provider and model. For example, as shown in the documentation structure, the qwen3 model deployed by vllm places additional parameters in chat_template_kwargs. I tested an inference model API service from modelScope with the following parameter structure: curl --request POST \
--url https://api-inference.modelscope.cn/v1/chat/completions \
--header 'Authorization: Bearer token' \
--header 'Content-Type: application/json' \
--data '{
"model": "Qwen/Qwen3-8B",
"messages": [
{
"role": "user",
"content": "Give me a short introduction to large language models."
}
],
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"stream": true,
"max_tokens": 8192,
"presence_penalty": 1.5,
"enable_thinking": true
}' Therefore, we might need to consider how to handle this more appropriately, ensuring sufficient flexibility for both the caller and client sides. |
related issue: #2941
enable_thinking is used to control whether the Qwen3 model enables the thinking mode.
Thank you for taking time to contribute this pull request!
You might have already read the [contributor guide][1], but as a reminder, please make sure to:
main
branch and squash your commits