feat: options and ChatCompletionRequest add property enable_thinking #2940

xuanmiss · 2025-04-29T17:00:23Z

related issue: #2941

enable_thinking is used to control whether the Qwen3 model enables the thinking mode.

Thank you for taking time to contribute this pull request!
You might have already read the [contributor guide][1], but as a reminder, please make sure to:

Sign the contributor license agreement
Rebase your changes on the latest main branch and squash your commits
Add/Update unit tests as needed
Run a build and make sure all tests pass prior to submission

… enable_thinking is used to control whether the Qwen3 model enables the thinking mode. Signed-off-by: xuanmiss <xuanmine@gmail.com>

markpollack · 2025-04-30T14:26:57Z

models/spring-ai-openai/src/main/java/org/springframework/ai/openai/OpenAiChatOptions.java

+	/**
+	 * Whether to enable the thinking mode
+	 */
+	private @JsonProperty("enable_thinking") Boolean enableThinking;


I'm not sure what to do with these differences emerging, in particular in the reasoning models. This option is not part of openai.

Maybe we can have a subclass of OpenAiChatOptions such as QwenAiChatOptions?

How about utilizing something like the template pattern? Apart from the openai compatible apis, in general, most of the models just have a few differences on request and response objects

@apappascs can you elaborate more please?

apappascs · 2025-05-02T09:17:33Z

Thank you for the contribution @xuanmiss . Could you please add some integration tests ?

Given the documentation it's not so clear that this is the correct structure https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes.

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'

As a temporary solution, you can add /think in the end of your prompt.

xuanmiss · 2025-05-07T02:54:16Z

Thank you for the contribution @xuanmiss . Could you please add some integration tests ?感谢您的贡献。您能否添加一些集成测试？

Given the documentation it's not so clear that this is the correct structure https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes.鉴于文档，这是否是正确的结构 https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes 并不清楚。
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Give me a short introduction to large language models."}
  ],
  "temperature": 0.7,
  "top_p": 0.8,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "chat_template_kwargs": {"enable_thinking": false}
}'
As a temporary solution, you can add /think in the end of your prompt.作为临时解决方案，您可以在提示符末尾添加 /think。

This does seem a bit complicated. Although various model providers or deployment inference model services like vllm and SGLang are compatible with OpenAI's API format and protocol, there might still be some parameter differences depending on the provider and model. For example, as shown in the documentation structure, the qwen3 model deployed by vllm places additional parameters in chat_template_kwargs. I tested an inference model API service from modelScope with the following parameter structure:

curl --request POST \
  --url https://api-inference.modelscope.cn/v1/chat/completions \
  --header 'Authorization: Bearer token' \
  --header 'Content-Type: application/json' \
  --data '{
	"model": "Qwen/Qwen3-8B",
	"messages": [
		{
			"role": "user",
			"content": "Give me a short introduction to large language models."
		}
	],
	"temperature": 0.7,
	"top_p": 0.8,
	"top_k": 20,
	"stream": true,
	"max_tokens": 8192,
	"presence_penalty": 1.5,
	"enable_thinking": true
}'

Therefore, we might need to consider how to handle this more appropriately, ensuring sufficient flexibility for both the caller and client sides.

feat: options and ChatCompletionRequest add property enable_thinking.…

0681646

… enable_thinking is used to control whether the Qwen3 model enables the thinking mode. Signed-off-by: xuanmiss <xuanmine@gmail.com>

xuanmiss force-pushed the feat-chatoptions branch from 5b67970 to 0681646 Compare April 29, 2025 17:02

markpollack reviewed Apr 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: options and ChatCompletionRequest add property enable_thinking #2940

feat: options and ChatCompletionRequest add property enable_thinking #2940

xuanmiss commented Apr 29, 2025 •

edited

Loading

markpollack Apr 30, 2025

apappascs May 2, 2025

markpollack May 6, 2025

markpollack May 6, 2025

apappascs commented May 2, 2025

xuanmiss commented May 7, 2025 •

edited

Loading

feat: options and ChatCompletionRequest add property enable_thinking #2940

Are you sure you want to change the base?

feat: options and ChatCompletionRequest add property enable_thinking #2940

Conversation

xuanmiss commented Apr 29, 2025 • edited Loading

markpollack Apr 30, 2025

Choose a reason for hiding this comment

apappascs May 2, 2025

Choose a reason for hiding this comment

markpollack May 6, 2025

Choose a reason for hiding this comment

markpollack May 6, 2025

Choose a reason for hiding this comment

apappascs commented May 2, 2025

xuanmiss commented May 7, 2025 • edited Loading

xuanmiss commented Apr 29, 2025 •

edited

Loading

xuanmiss commented May 7, 2025 •

edited

Loading