503 The service is currently unavailable when using Context caching Feature #500

okada1220 · 2024-08-08T02:57:43Z

Description of the bug:

I'm trying to create a cache by reading the contents of multiple PDF files, but when the total number of tokens within the files exceeds approximately 500,000 tokens, I receive a 503 error (Service Unavailable) from Google API Core.

It seems that the error isn't returning immediately, but rather after about 40 to 50 seconds. This might indicate that a timeout is occurring in Google API Core.

Code

import google.generativeai as genai
import os

gemini_api_key = os.environ.get("GEMINI_API_KEY")
genai.configure(api_key=gemini_api_key)

documents = []
file_list = ["xxx.pdf", "yyy.pdf", ...]
for file in file_list:
  gemini_file = genai.upload_file(path=file, display_name=file)
  documents.append(gemini_file)

gemini_client = genai.GenerativeModel("models/gemini-1.5-flash-001")
total_token = gemini_client.count_tokens(documents).total_tokens)
print(f"total_token: {total_token}")
# total_token: 592403

gemini_cache = genai.caching.CachedContent.create(model=“models/gemini-1.5-flash-001”, display_name=“sample”, contents=documents)

Version

Python 3.9.19
google==3.0.0
google-ai-generativelanguage==0.6.6
google-api-core==2.19.0
google-api-python-client==2.105.0
google-auth==2.29.0
google-auth-httplib2==0.2.0
google-generativeai==0.7.2
googleapis-common-protos==1.63.0

Actual vs expected behavior:

Actual behavior

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1176, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "The service is currently unavailable."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:172.217.175.234:443 {created_time:"2024-08-06T13:37:03.077186006+09:00", grpc_status:14, grpc_message:"The service is currently unavailable."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/google/generativeai/caching.py", line 219, in create
    response = client.create_cached_content(request)
  File "/usr/local/lib/python3.9/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 874, in create_cached_content
    response = rpc(
  File "/usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.

Expected behavior

gemini_cache = genai.caching.CachedContent.create(model="models/gemini-1.5-flash-001", display_name="sample", contents=documents)
print(gemini_cache)

# CachedContent(
#     name='cachedContents/l5ataay9naq2',
#     model='models/gemini-1.5-flash-001',
#     display_name='sample',
#     usage_metadata={
#         'total_token_count': 592403,
#     },
#     create_time=2024-08-08 01:21:44.925021+00:00,
#     update_time=2024-08-08 01:21:44.925021+00:00,
#     expire_time=2024-08-08 02:21:43.787890+00:00
# )

Any other information you'd like to share?

https://ai.google.dev/gemini-api/docs/caching?lang=python#considerations

The minimum input token count for context caching is 32,768, and the maximum is the same as the maximum for the given model. (For more on counting tokens, see the Token guide).

Upon reviewing the Gemini API documentation, I noticed an interesting mismatch regarding token limits. While the maximum token count is described as being dependent on the specific model in use. In my case, I'm utilizing the models/gemini-1.5-flash-001 model, which has a maximum input token limit of 1,048,576. Based on this information, I initially assumed that processing around 500,000 tokens should be working without any issues.

Moreover, I was able to successfully generate the cache even with token counts exceeding 800,000 when attempting to create a cache using a string. This leads me to suspect that there might be a bug specifically related to creating cache files with high token counts, as opposed to string-based caching.

The text was updated successfully, but these errors were encountered:

gurugecl · 2024-08-11T01:41:39Z

Im experiencing the same issue even when using models/gemini-1.5-pro-001 and trying to cache roughly 300k tokens even though it has an input token limit of 2,097,152

singhniraj08 · 2024-08-13T04:42:07Z

@okada1220,

Thank you reporting this issue. This looks like an intermittent error and should work now. Automatic retry logic is added to SDK to avoid these errors and you can follow google-gemini/cookbook#469 FR for examples on retry logic. Thanks

okada1220 · 2024-08-14T05:07:34Z

@singhniraj08
Thank you for your response.

I checked again, and it seems that the same error is still occurring...

I looked at the retry logic example in google-gemini/cookbook#469, which seems to apply when using request_options withgenerate_content. But since I’m using genai.caching.CachedContent.create, which doesn’t have request_options, I’m wondering if this retry logic is still applicable here. Do you think this approach will work in my case?

nate-walter · 2024-10-29T23:56:59Z

I'm receiving this error too

Balzard · 2025-01-15T20:40:27Z

I'm experiencing the same issue. Context caching with pdf files raise HTTP 503 while directly injecting strings into the cache works. Any update on this issue ?

sobjornstad · 2025-01-24T22:01:06Z

Also having the exact same issue with a collection of large PDFs.

BrianHung · 2025-01-28T01:46:30Z

This might be an internal API error because I've gotten 503 using TypeScript.

codeananda · 2025-01-29T11:34:54Z

I'm also having this issue.

Note that this still doesn't work even if you base64 encode the pdfs.

import os
from pathlib import Path
from base64 import b64encode

import google.generativeai as genai

os.environ["GOOGLE_API_KEY"] = "xxxxxx"

pdf_paths = [Path("1.pdf"), Path("2.pdf"), ...]

encoded_pdfs = [b64encode(path.read_bytes()).decode("utf-8") for path in pdf_paths]
contents = [
    {
        "role": "user",
        "parts": [{"inline_data": {"mime_type": "application/pdf", "data": content}}],
    }
    for content in encoded_pdfs
]

cache = genai.caching.CachedContent.create(model="gemini-1.5-flash-001", contents=contents)

print("Cache created:", cache)

Full error stack trace

Traceback (most recent call last):
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 277, in __call__
    response, ignored_call = self._with_call(
                             ^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 332, in _with_call
    return call.result(), call
           ^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 440, in result
    raise self
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 315, in continuation
    response, call = self._thunk(new_method).with_call(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 1198, in with_call
    return _end_unary_response_blocking(state, call, True, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "The service is currently unavailable."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:216.58.212.202:443 {created_time:"2025-01-29T16:49:05.933467+02:00", grpc_status:14, grpc_message:"The service is currently unavailable."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/adam/Documents/1 Projects/Landstack/document_summary/utils/pdfs.py", line 163, in <module>
    cache = genai.caching.CachedContent.create(model="gemini-1.5-flash-001", contents=contents)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/generativeai/caching.py", line 219, in create
    response = client.create_cached_content(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 883, in create_cached_content
    response = rpc(
               ^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.

codeananda · 2025-01-30T14:33:01Z

Update: switching to VertexAI did the trick for me!

Install libraries: pip install vertexai langchain_google_vertexai
Set .env vars
Chunk PDFs below 50MB
Upload to GCS and get the uris
Cache using the vertexai implementation

.env contents

VERTEX_PROJECT_ID=xxxxxx
VERTEX_LOCATION=xxxxxx
VERTEX_CREDENTIAL_PATH=path_to_credentials.json
GOOGLE_APPLICATION_CREDENTIALS=path_to_credentials.json

I've also found that VertexAI 1.5 Pro model can handle many more tokens and requests per minute than AI Studio. Had no retry errors with the former and endless issues with the latter.

from pathlib import Path
from google.cloud import storage
from google.cloud.exceptions import NotFound
from vertexai.generative_models import Part, Content
from vertexai.preview import caching
from langchain_google_vertexai import ChatVertexAI
from dotenv import load_dotenv

load_dotenv()

llm = ChatVertexAI(model_name="gemini-1.5-pro-002")

pdf_paths = [Path("1.pdf"), Path("2.pdf"), ...]

gcs_uris = [
    upload_pdf_to_gcs(pdf_path, 'bucket-name') for pdf_path in pdf_paths
]

parts = [Part.from_uri(uri=uri, mime_type="application/pdf") for uri in gcs_uris]
contents = [Content(role="user", parts=parts)]

cached_pdfs = caching.CachedContent.create(
    model_name=llm.model_name, contents=contents
)

llm.cached_content = cached_pdfs.name

upload_pdf_to_gcs definition

def upload_pdf_to_gcs(
    pdf_path: str | Path,
    bucket_name: str,
    destination_blob_name: str | None = None,
    create_bucket: bool = True,
) -> str:
    """
    Upload a PDF file to Google Cloud Storage and return its URI.

    Parameters
    ----------
    pdf_path : str | Path
        Local path to the PDF file
    bucket_name : str
        Name of the GCS bucket to upload to
    destination_blob_name : str | None
        Name to give the file in GCS. If None, uses the original filename
    create_bucket : bool, default False
        If True, creates the bucket if it doesn't exist
    """
    pdf_path = Path(pdf_path)

    if not destination_blob_name:
        destination_blob_name = str(pdf_path)

    client = storage.Client()

    try:
        bucket = client.get_bucket(bucket_name)
    except NotFound:
        if create_bucket:
            bucket = client.create_bucket(bucket_name)
        else:
            raise

    blob = bucket.blob(destination_blob_name)
    blob.upload_from_filename(str(pdf_path))

    file_uri = f"gs://{bucket_name}/{destination_blob_name}"
    return file_uri

ctg5 · 2025-04-29T10:31:27Z

For anyone finding this thread in mid-2025 and wondering if this is still an issue, the answer: Yes. I'm getting many, many 503 errors when uploading two PDFs of ~100 pages each. Tried increasing standoff retry logic. Didn't help. Will be moving to Vertex AI.

sebastian-305 · 2025-04-29T12:21:53Z

Still an issue for me. Today, I was able to create one context (yesterday none) with the demo-file from the docs. But not anymore... no usefull error message, just service not available. When I make the context to small, it gives my my token count and the min count, when I make the context to big, it gives me my token count and the max count. When the context is just right, the service is down....

ovicrisan · 2025-04-29T12:51:09Z

You can see the status of the issue: "Open". So nothing fixed yet. I'm using it with REST API from .NET and have the same issue for larger files. It seems it's a hardcoded 80 seconds timeout for all caching creation.

devagarwal007 · 2025-04-30T18:10:01Z

Facing same issue

codeananda · 2025-05-01T10:58:53Z

Switching to Vertex AI will help, see my above answer for python implementation

kmatthews812 · 2025-05-01T21:08:09Z

Switching to Vertex worked for me as well. Here is a Ruby implementation that avoids using the google-cloud-ai_platform gem, which I found to be out of date and doesn't really add much value anyway. Hope it helps.

require 'google/cloud/storage'
require 'googleauth'

class VertexAdapter
  PROJECT_ID = ENV['VERTEX_PROJECT_ID']
  LOCATION = ENV['VERTEX_LOCATION']
  MODEL_NAME = "gemini-2.5-pro-preview-03-25"
  BUCKET_NAME = ENV["GOOGLE_CLOUD_STORAGE_BUCKET"]
  BASE_URL = "https://#{LOCATION}-aiplatform.googleapis.com"
  PARENT = "projects/#{PROJECT_ID}/locations/#{LOCATION}"  
  HTTP_TIMEOUT = 480
  CACHE_TTL = "1800s"

  def initialize
    scope = "https://www.googleapis.com/auth/cloud-platform"
    authorization = Google::Auth.get_application_default(scope)
    token = authorization.fetch_access_token!["access_token"]
    @api_headers = { "Content-Type" => "application/json", "Authorization" => "Bearer #{token}" }
  end

  def upload_file(pdf_content, s3_key)
    Rails.logger.info("Uploading to Vertex (Google Cloud Storage) API")
    
    storage = Google::Cloud::Storage.new(
      project_id: ENV["VERTEX_PROJECT_ID"]
    )

    # Create a valid bucket name following GCS naming conventions
    bucket = storage.bucket(BUCKET_NAME) || storage.create_bucket(BUCKET_NAME)

    # Upload the file with a path that includes the upload job ID
    file = bucket.create_file(
      StringIO.new(pdf_content),
      s3_key,
      content_type: 'application/pdf'
    )

    file.public_url
  end

  def check_cache(cache_id)
    Rails.logger.info("Checking cache for #{cache_id}")
    begin
      endpoint = "#{BASE_URL}/v1/#{PARENT}/cachedContents/#{cache_id}"
      Rails.logger.info("Checking cache for #{endpoint}")
      response = HTTParty.get(endpoint, headers: @api_headers)
      if response.code == 200
        Rails.logger.info("Cache found for #{cache_id}")
        true
      else
        Rails.logger.info("Cache not found for #{cache_id}")
        false
      end
    rescue => e
      # On network or server error, handle the same as a cache miss
      Rails.logger.error("Cache lookup error: #{e.message}")
      false
    end
  end

  def delete_cache(cache_id)
    Rails.logger.info("Deleting cache for #{cache_id}")
    begin
      endpoint = "#{BASE_URL}/v1/#{PARENT}/cachedContents/#{cache_id}"
      response = HTTParty.delete(endpoint, headers: @api_headers)
      if response.code == 200
        Rails.logger.info("Cache deleted for #{cache_id}")
        true
      else
        false
      end
    rescue => e
      # On network or server error, handle the same as a cache miss
      Rails.logger.error("Cache delete error: #{e.message}")
      false
    end
  end

  def cache_file(s3_key)
    system_prompt = File.read(Rails.root.join("lib", "instructions", "system_prompt.md"))
    body = {
      "model": "projects/#{PROJECT_ID}/locations/#{LOCATION}/publishers/google/models/#{MODEL_NAME}",
      "contents":[
        {
          "parts":[
            {"file_data": {"mime_type": "application/pdf", "file_uri": "gs://#{BUCKET_NAME}/#{s3_key}"}},
          ],
          "role": "user"
        }
      ],
      "systemInstruction": {
        "parts": [
          {
            "text": system_prompt
          }
        ],
        "role": "system"
      },
      "ttl": CACHE_TTL
    }
    
    endpoint = "#{BASE_URL}/v1/#{PARENT}/cachedContents"
    response = HTTParty.post(
      endpoint,
      headers: @api_headers,
      body: body.to_json,
      timeout: HTTP_TIMEOUT
    )

    if response.code == 200
      cached_name = JSON.parse(response.body)["name"]
      Rails.logger.debug("Successfully cached as #{cached_name}")
      cached_name
    else
      Rails.logger.error("API Error (#{response.code}): #{response.body}")
      raise RuntimeError, "API Error (#{response.code}): #{response.body}"
    end
  end

  def generate_content(cached_name)
    Rails.logger.info("Generating content from cached context")

    user_prompt = File.read(Rails.root.join("lib", "instructions", "user_prompt.md"))
    
    body = {
      "contents": [
        {
          "role": "user",
          "parts": [
            {
              "text": user_prompt,
            },
          ]
        },
      ],
      "cachedContent": cached_name,
      "generationConfig": {
        "responseMimeType": "text/plain",
      },
    }

    endpoint = "#{BASE_URL}/v1/#{PARENT}/publishers/google/models/#{MODEL_NAME}:generateContent"
    response = HTTParty.post(
      endpoint,
      headers: @api_headers,
      body: body.to_json,
      timeout: HTTP_TIMEOUT
    )

    if response.code == 200
      response_json = JSON.parse(response.body)
      response_json.dig("candidates", 0, "content", "parts", 0, "text")
    else
      Rails.logger.error("API Error (#{response.code}): #{response.body}")
      raise RuntimeError, "API Error (#{response.code}): #{response.body}"
    end
  end
end

singhniraj08 added type:bug Something isn't working status:awaiting user response Awaiting a response from the author component:other Questions unrelated to SDK labels Aug 13, 2024

singhniraj08 assigned MarkDaoust Aug 14, 2024

singhniraj08 added status:triaged Issue/PR triaged to the corresponding sub-team and removed status:awaiting user response Awaiting a response from the author labels Aug 14, 2024

MarkDaoust removed their assignment Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

503 The service is currently unavailable when using Context caching Feature #500

503 The service is currently unavailable when using Context caching Feature #500

okada1220 commented Aug 8, 2024 •

edited

Loading

gurugecl commented Aug 11, 2024 •

edited

Loading

singhniraj08 commented Aug 13, 2024

okada1220 commented Aug 14, 2024

nate-walter commented Oct 29, 2024

Balzard commented Jan 15, 2025

sobjornstad commented Jan 24, 2025

BrianHung commented Jan 28, 2025

codeananda commented Jan 29, 2025 •

edited

Loading

codeananda commented Jan 30, 2025

ctg5 commented Apr 29, 2025

sebastian-305 commented Apr 29, 2025

ovicrisan commented Apr 29, 2025 •

edited

Loading

devagarwal007 commented Apr 30, 2025

codeananda commented May 1, 2025

kmatthews812 commented May 1, 2025

503 The service is currently unavailable when using Context caching Feature #500

503 The service is currently unavailable when using Context caching Feature #500

Comments

okada1220 commented Aug 8, 2024 • edited Loading

Description of the bug:

Code

Version

Actual vs expected behavior:

Actual behavior

Expected behavior

Any other information you'd like to share?

gurugecl commented Aug 11, 2024 • edited Loading

singhniraj08 commented Aug 13, 2024

okada1220 commented Aug 14, 2024

nate-walter commented Oct 29, 2024

Balzard commented Jan 15, 2025

sobjornstad commented Jan 24, 2025

BrianHung commented Jan 28, 2025

codeananda commented Jan 29, 2025 • edited Loading

codeananda commented Jan 30, 2025

ctg5 commented Apr 29, 2025

sebastian-305 commented Apr 29, 2025

ovicrisan commented Apr 29, 2025 • edited Loading

devagarwal007 commented Apr 30, 2025

codeananda commented May 1, 2025

kmatthews812 commented May 1, 2025

okada1220 commented Aug 8, 2024 •

edited

Loading

gurugecl commented Aug 11, 2024 •

edited

Loading

codeananda commented Jan 29, 2025 •

edited

Loading

ovicrisan commented Apr 29, 2025 •

edited

Loading