Skip to content

feat: options and ChatCompletionRequest add property enable_thinking #2940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xuanmiss
Copy link

@xuanmiss xuanmiss commented Apr 29, 2025

related issue: #2941

enable_thinking is used to control whether the Qwen3 model enables the thinking mode.

Thank you for taking time to contribute this pull request!
You might have already read the [contributor guide][1], but as a reminder, please make sure to:

  • Sign the contributor license agreement
  • Rebase your changes on the latest main branch and squash your commits
  • Add/Update unit tests as needed
  • Run a build and make sure all tests pass prior to submission

… enable_thinking is used to control whether the Qwen3 model enables the thinking mode.

Signed-off-by: xuanmiss <xuanmine@gmail.com>
/**
* Whether to enable the thinking mode
*/
private @JsonProperty("enable_thinking") Boolean enableThinking;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to do with these differences emerging, in particular in the reasoning models. This option is not part of openai.

Maybe we can have a subclass of OpenAiChatOptions such as QwenAiChatOptions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about utilizing something like the template pattern? Apart from the openai compatible apis, in general, most of the models just have a few differences on request and response objects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apappascs can you elaborate more please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apappascs can you elaborate more please?

@apappascs
Copy link
Contributor

Thank you for the contribution @xuanmiss . Could you please add some integration tests ?

Given the documentation it's not so clear that this is the correct structure https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes.

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

As a temporary solution, you can add /think in the end of your prompt.

@xuanmiss
Copy link
Author

xuanmiss commented May 7, 2025

Thank you for the contribution @xuanmiss . Could you please add some integration tests ?感谢您的贡献。您能否添加一些集成测试?

Given the documentation it's not so clear that this is the correct structure https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes.鉴于文档,这是否是正确的结构 https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes 并不清楚。

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

As a temporary solution, you can add /think in the end of your prompt.作为临时解决方案,您可以在提示符末尾添加 /think

This does seem a bit complicated. Although various model providers or deployment inference model services like vllm and SGLang are compatible with OpenAI's API format and protocol, there might still be some parameter differences depending on the provider and model. For example, as shown in the documentation structure, the qwen3 model deployed by vllm places additional parameters in chat_template_kwargs. I tested an inference model API service from modelScope with the following parameter structure:

curl --request POST \
  --url https://api-inference.modelscope.cn/v1/chat/completions \
  --header 'Authorization: Bearer token' \
  --header 'Content-Type: application/json' \
  --data '{
	"model": "Qwen/Qwen3-8B",
	"messages": [
		{
			"role": "user",
			"content": "Give me a short introduction to large language models."
		}
	],
	"temperature": 0.7,
	"top_p": 0.8,
	"top_k": 20,
	"stream": true,
	"max_tokens": 8192,
	"presence_penalty": 1.5,
	"enable_thinking": true
}'

Therefore, we might need to consider how to handle this more appropriately, ensuring sufficient flexibility for both the caller and client sides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants