Skip to content

Commit 16f00c3

Browse files
authored
Merge pull request #528 from alexrudall/7.3.0
7.3.0
2 parents 4627c94 + f3d4121 commit 16f00c3

13 files changed

+456
-378
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [7.3.0] - 2024-10-11
9+
10+
### Added
11+
12+
- Add ability to (with the right incantations) retrieve the chunks used by an Assistant file search - thanks to [@agamble](https://github.com/agamble) for the addition!
13+
814
## [7.2.0] - 2024-10-10
915

1016
### Added

Gemfile.lock

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
PATH
22
remote: .
33
specs:
4-
ruby-openai (7.2.0)
4+
ruby-openai (7.3.0)
55
event_stream_parser (>= 0.3.0, < 2.0.0)
66
faraday (>= 1)
77
faraday-multipart (>= 1)

README.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1111,6 +1111,116 @@ end
11111111

11121112
Note that you have 10 minutes to submit your tool output before the run expires.
11131113

1114+
#### Exploring chunks used in File Search
1115+
1116+
Take a deep breath. You might need a drink for this one.
1117+
1118+
It's possible for OpenAI to share what chunks it used in its internal RAG Pipeline to create its filesearch example.
1119+
1120+
An example spec can be found [here](https://github.com/alexrudall/ruby-openai/blob/main/spec/openai/client/assistant_file_search_spec.rb) that does this, just so you know it's possible.
1121+
1122+
Here's how to get the chunks used in a file search. In this example I'm using [this file](https://css4.pub/2015/textbook/somatosensory.pdf):
1123+
1124+
```
1125+
require "openai"
1126+
1127+
# Make a client
1128+
client = OpenAI::Client.new(
1129+
access_token: "access_token_goes_here",
1130+
log_errors: true # Don't do this in production.
1131+
)
1132+
1133+
# Upload your file(s)
1134+
file_id = client.files.upload(
1135+
parameters: {
1136+
file: "path/to/somatosensory.pdf",
1137+
purpose: "assistants"
1138+
}
1139+
)["id"]
1140+
1141+
# Create a vector store to store the vectorised file(s)
1142+
vector_store_id = client.vector_stores.create(parameters: {})["id"]
1143+
1144+
# Vectorise the file(s)
1145+
vector_store_file_id = client.vector_store_files.create(
1146+
vector_store_id: vector_store_id,
1147+
parameters: { file_id: file_id }
1148+
)["id"]
1149+
1150+
# Check that the file is vectorised (wait for status to be "completed")
1151+
client.vector_store_files.retrieve(vector_store_id: vector_store_id, id: vector_store_file_id)["status"]
1152+
1153+
# Create an assistant, referencing the vector store
1154+
assistant_id = client.assistants.create(
1155+
parameters: {
1156+
model: "gpt-4o",
1157+
name: "Answer finder",
1158+
instructions: "You are a file search tool. Find the answer in the given files, please.",
1159+
tools: [
1160+
{ type: "file_search" }
1161+
],
1162+
tool_resources: {
1163+
file_search: {
1164+
vector_store_ids: [vector_store_id]
1165+
}
1166+
}
1167+
}
1168+
)["id"]
1169+
1170+
# Create a thread with your question
1171+
thread_id = client.threads.create(parameters: {
1172+
messages: [
1173+
{ role: "user",
1174+
content: "Find the description of a nociceptor." }
1175+
]
1176+
})["id"]
1177+
1178+
# Run the thread to generate the response. Include the "GIVE ME THE CHUNKS" incantation.
1179+
run_id = client.runs.create(
1180+
thread_id: thread_id,
1181+
parameters: {
1182+
assistant_id: assistant_id
1183+
},
1184+
query_parameters: { include: ["step_details.tool_calls[*].file_search.results[*].content"] } # incantation
1185+
)["id"]
1186+
1187+
# Get the steps that happened in the run
1188+
steps = client.run_steps.list(
1189+
thread_id: thread_id,
1190+
run_id: run_id,
1191+
parameters: { order: "asc" }
1192+
)
1193+
1194+
# Get the last step ID (or whichever one you want to look at)
1195+
step_id = steps["data"].first["id"]
1196+
1197+
# Retrieve all the steps. Include the "GIVE ME THE CHUNKS" incantation again.
1198+
steps = steps["data"].map do |step|
1199+
client.run_steps.retrieve(
1200+
thread_id: thread_id,
1201+
run_id: run_id,
1202+
id: step["id"],
1203+
parameters: { include: ["step_details.tool_calls[*].file_search.results[*].content"] } # incantation
1204+
)
1205+
end
1206+
1207+
# Now we've got the chunk info, buried deep. Loop through the steps and find chunks if included:
1208+
chunks = steps.flat_map do |step|
1209+
included_results = step.dig("step_details", "tool_calls", 0, "file_search", "results")
1210+
1211+
next if included_results.nil? || included_results.empty?
1212+
1213+
included_results.flat_map do |result|
1214+
result["content"].map do |content|
1215+
content["text"]
1216+
end
1217+
end
1218+
end.compact
1219+
1220+
# The first chunk will be the closest match to the prompt. Finally, if you want to view the completed message(s):
1221+
client.messages.list(thread_id: thread_id)
1222+
```
1223+
11141224
### Image Generation
11151225

11161226
Generate images using DALL·E 2 or DALL·E 3!

lib/openai/version.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
module OpenAI
2-
VERSION = "7.2.0".freeze
2+
VERSION = "7.3.0".freeze
33
end

spec/fixtures/cassettes/file_search_assistant_setup.yml renamed to spec/fixtures/cassettes/assistant_file_search_assistant_setup.yml

Lines changed: 12 additions & 12 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

spec/fixtures/cassettes/file_search_create_run.yml renamed to spec/fixtures/cassettes/assistant_file_search_create_run.yml

Lines changed: 15 additions & 15 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

spec/fixtures/cassettes/file_search_file_setup.yml renamed to spec/fixtures/cassettes/assistant_file_search_file_setup.yml

Lines changed: 12 additions & 12 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)