-
Notifications
You must be signed in to change notification settings - Fork 456
503 The service is currently unavailable when using Context caching Feature #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Im experiencing the same issue even when using models/gemini-1.5-pro-001 and trying to cache roughly 300k tokens even though it has an input token limit of 2,097,152 |
Thank you reporting this issue. This looks like an intermittent error and should work now. Automatic retry logic is added to SDK to avoid these errors and you can follow google-gemini/cookbook#469 FR for examples on retry logic. Thanks |
@singhniraj08 I checked again, and it seems that the same error is still occurring... I looked at the retry logic example in google-gemini/cookbook#469, which seems to apply when using |
I'm receiving this error too |
I'm experiencing the same issue. Context caching with pdf files raise HTTP 503 while directly injecting strings into the cache works. Any update on this issue ? |
Also having the exact same issue with a collection of large PDFs. |
This might be an internal API error because I've gotten 503 using TypeScript. |
I'm also having this issue. Note that this still doesn't work even if you base64 encode the pdfs. import os
from pathlib import Path
from base64 import b64encode
import google.generativeai as genai
os.environ["GOOGLE_API_KEY"] = "xxxxxx"
pdf_paths = [Path("1.pdf"), Path("2.pdf"), ...]
encoded_pdfs = [b64encode(path.read_bytes()).decode("utf-8") for path in pdf_paths]
contents = [
{
"role": "user",
"parts": [{"inline_data": {"mime_type": "application/pdf", "data": content}}],
}
for content in encoded_pdfs
]
cache = genai.caching.CachedContent.create(model="gemini-1.5-flash-001", contents=contents)
print("Cache created:", cache) Full error stack traceTraceback (most recent call last):
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
return callable_(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 277, in __call__
response, ignored_call = self._with_call(
^^^^^^^^^^^^^^^^
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 332, in _with_call
return call.result(), call
^^^^^^^^^^^^^
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 440, in result
raise self
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 315, in continuation
response, call = self._thunk(new_method).with_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 1198, in with_call
return _end_unary_response_blocking(state, call, True, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "The service is currently unavailable."
debug_error_string = "UNKNOWN:Error received from peer ipv4:216.58.212.202:443 {created_time:"2025-01-29T16:49:05.933467+02:00", grpc_status:14, grpc_message:"The service is currently unavailable."}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/adam/Documents/1 Projects/Landstack/document_summary/utils/pdfs.py", line 163, in <module>
cache = genai.caching.CachedContent.create(model="gemini-1.5-flash-001", contents=contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/generativeai/caching.py", line 219, in create
response = client.create_cached_content(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 883, in create_cached_content
response = rpc(
^^^^
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
return wrapped_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable. |
Update: switching to VertexAI did the trick for me!
I've also found that VertexAI 1.5 Pro model can handle many more tokens and requests per minute than AI Studio. Had no retry errors with the former and endless issues with the latter. from pathlib import Path
from google.cloud import storage
from google.cloud.exceptions import NotFound
from vertexai.generative_models import Part, Content
from vertexai.preview import caching
from langchain_google_vertexai import ChatVertexAI
from dotenv import load_dotenv
load_dotenv()
llm = ChatVertexAI(model_name="gemini-1.5-pro-002")
pdf_paths = [Path("1.pdf"), Path("2.pdf"), ...]
gcs_uris = [
upload_pdf_to_gcs(pdf_path, 'bucket-name') for pdf_path in pdf_paths
]
parts = [Part.from_uri(uri=uri, mime_type="application/pdf") for uri in gcs_uris]
contents = [Content(role="user", parts=parts)]
cached_pdfs = caching.CachedContent.create(
model_name=llm.model_name, contents=contents
)
llm.cached_content = cached_pdfs.name upload_pdf_to_gcs definitiondef upload_pdf_to_gcs(
pdf_path: str | Path,
bucket_name: str,
destination_blob_name: str | None = None,
create_bucket: bool = True,
) -> str:
"""
Upload a PDF file to Google Cloud Storage and return its URI.
Parameters
----------
pdf_path : str | Path
Local path to the PDF file
bucket_name : str
Name of the GCS bucket to upload to
destination_blob_name : str | None
Name to give the file in GCS. If None, uses the original filename
create_bucket : bool, default False
If True, creates the bucket if it doesn't exist
"""
pdf_path = Path(pdf_path)
if not destination_blob_name:
destination_blob_name = str(pdf_path)
client = storage.Client()
try:
bucket = client.get_bucket(bucket_name)
except NotFound:
if create_bucket:
bucket = client.create_bucket(bucket_name)
else:
raise
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(str(pdf_path))
file_uri = f"gs://{bucket_name}/{destination_blob_name}"
return file_uri |
For anyone finding this thread in mid-2025 and wondering if this is still an issue, the answer: Yes. I'm getting many, many 503 errors when uploading two PDFs of ~100 pages each. Tried increasing standoff retry logic. Didn't help. Will be moving to Vertex AI. |
Still an issue for me. Today, I was able to create one context (yesterday none) with the demo-file from the docs. But not anymore... no usefull error message, just service not available. When I make the context to small, it gives my my token count and the min count, when I make the context to big, it gives me my token count and the max count. When the context is just right, the service is down.... |
You can see the status of the issue: "Open". So nothing fixed yet. I'm using it with REST API from .NET and have the same issue for larger files. It seems it's a hardcoded 80 seconds timeout for all caching creation. |
Facing same issue |
Switching to Vertex AI will help, see my above answer for python implementation |
Switching to Vertex worked for me as well. Here is a Ruby implementation that avoids using the google-cloud-ai_platform gem, which I found to be out of date and doesn't really add much value anyway. Hope it helps.
|
Description of the bug:
I'm trying to create a cache by reading the contents of multiple PDF files, but when the total number of tokens within the files exceeds approximately 500,000 tokens, I receive a 503 error (Service Unavailable) from Google API Core.
It seems that the error isn't returning immediately, but rather after about 40 to 50 seconds. This might indicate that a timeout is occurring in Google API Core.
Code
Version
Actual vs expected behavior:
Actual behavior
Expected behavior
Any other information you'd like to share?
Upon reviewing the Gemini API documentation, I noticed an interesting mismatch regarding token limits. While the maximum token count is described as being dependent on the specific model in use. In my case, I'm utilizing the
models/gemini-1.5-flash-001
model, which has a maximum input token limit of 1,048,576. Based on this information, I initially assumed that processing around 500,000 tokens should be working without any issues.Moreover, I was able to successfully generate the cache even with token counts exceeding 800,000 when attempting to create a cache using a string. This leads me to suspect that there might be a bug specifically related to creating cache files with high token counts, as opposed to string-based caching.
The text was updated successfully, but these errors were encountered: