Skip to content

503 The service is currently unavailable when using Context caching Feature #500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
okada1220 opened this issue Aug 8, 2024 · 15 comments
Labels
component:other Questions unrelated to SDK status:triaged Issue/PR triaged to the corresponding sub-team type:bug Something isn't working

Comments

@okada1220
Copy link

okada1220 commented Aug 8, 2024

Description of the bug:

I'm trying to create a cache by reading the contents of multiple PDF files, but when the total number of tokens within the files exceeds approximately 500,000 tokens, I receive a 503 error (Service Unavailable) from Google API Core.

It seems that the error isn't returning immediately, but rather after about 40 to 50 seconds. This might indicate that a timeout is occurring in Google API Core.

Code

import google.generativeai as genai
import os

gemini_api_key = os.environ.get("GEMINI_API_KEY")
genai.configure(api_key=gemini_api_key)

documents = []
file_list = ["xxx.pdf", "yyy.pdf", ...]
for file in file_list:
  gemini_file = genai.upload_file(path=file, display_name=file)
  documents.append(gemini_file)

gemini_client = genai.GenerativeModel("models/gemini-1.5-flash-001")
total_token = gemini_client.count_tokens(documents).total_tokens)
print(f"total_token: {total_token}")
# total_token: 592403

gemini_cache = genai.caching.CachedContent.create(model=“models/gemini-1.5-flash-001”, display_name=“sample”, contents=documents)

Version

  • Python 3.9.19
  • google==3.0.0
  • google-ai-generativelanguage==0.6.6
  • google-api-core==2.19.0
  • google-api-python-client==2.105.0
  • google-auth==2.29.0
  • google-auth-httplib2==0.2.0
  • google-generativeai==0.7.2
  • googleapis-common-protos==1.63.0

Actual vs expected behavior:

Actual behavior

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1176, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "The service is currently unavailable."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:172.217.175.234:443 {created_time:"2024-08-06T13:37:03.077186006+09:00", grpc_status:14, grpc_message:"The service is currently unavailable."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/google/generativeai/caching.py", line 219, in create
    response = client.create_cached_content(request)
  File "/usr/local/lib/python3.9/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 874, in create_cached_content
    response = rpc(
  File "/usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.

Expected behavior

gemini_cache = genai.caching.CachedContent.create(model="models/gemini-1.5-flash-001", display_name="sample", contents=documents)
print(gemini_cache)

# CachedContent(
#     name='cachedContents/l5ataay9naq2',
#     model='models/gemini-1.5-flash-001',
#     display_name='sample',
#     usage_metadata={
#         'total_token_count': 592403,
#     },
#     create_time=2024-08-08 01:21:44.925021+00:00,
#     update_time=2024-08-08 01:21:44.925021+00:00,
#     expire_time=2024-08-08 02:21:43.787890+00:00
# )

Any other information you'd like to share?

The minimum input token count for context caching is 32,768, and the maximum is the same as the maximum for the given model. (For more on counting tokens, see the Token guide).

Upon reviewing the Gemini API documentation, I noticed an interesting mismatch regarding token limits. While the maximum token count is described as being dependent on the specific model in use. In my case, I'm utilizing the models/gemini-1.5-flash-001 model, which has a maximum input token limit of 1,048,576. Based on this information, I initially assumed that processing around 500,000 tokens should be working without any issues.

Moreover, I was able to successfully generate the cache even with token counts exceeding 800,000 when attempting to create a cache using a string. This leads me to suspect that there might be a bug specifically related to creating cache files with high token counts, as opposed to string-based caching.

@gurugecl
Copy link

gurugecl commented Aug 11, 2024

Im experiencing the same issue even when using models/gemini-1.5-pro-001 and trying to cache roughly 300k tokens even though it has an input token limit of 2,097,152

@singhniraj08
Copy link

@okada1220,

Thank you reporting this issue. This looks like an intermittent error and should work now. Automatic retry logic is added to SDK to avoid these errors and you can follow google-gemini/cookbook#469 FR for examples on retry logic. Thanks

@singhniraj08 singhniraj08 added type:bug Something isn't working status:awaiting user response Awaiting a response from the author component:other Questions unrelated to SDK labels Aug 13, 2024
@okada1220
Copy link
Author

@singhniraj08
Thank you for your response.

I checked again, and it seems that the same error is still occurring...

I looked at the retry logic example in google-gemini/cookbook#469, which seems to apply when using request_options withgenerate_content. But since I’m using genai.caching.CachedContent.create, which doesn’t have request_options, I’m wondering if this retry logic is still applicable here. Do you think this approach will work in my case?

@singhniraj08 singhniraj08 added status:triaged Issue/PR triaged to the corresponding sub-team and removed status:awaiting user response Awaiting a response from the author labels Aug 14, 2024
@nate-walter
Copy link

I'm receiving this error too

@Balzard
Copy link

Balzard commented Jan 15, 2025

I'm experiencing the same issue. Context caching with pdf files raise HTTP 503 while directly injecting strings into the cache works. Any update on this issue ?

@sobjornstad
Copy link

Also having the exact same issue with a collection of large PDFs.

@BrianHung
Copy link

This might be an internal API error because I've gotten 503 using TypeScript.

@MarkDaoust MarkDaoust removed their assignment Jan 28, 2025
@codeananda
Copy link

codeananda commented Jan 29, 2025

I'm also having this issue.

Note that this still doesn't work even if you base64 encode the pdfs.

import os
from pathlib import Path
from base64 import b64encode

import google.generativeai as genai

os.environ["GOOGLE_API_KEY"] = "xxxxxx"

pdf_paths = [Path("1.pdf"), Path("2.pdf"), ...]

encoded_pdfs = [b64encode(path.read_bytes()).decode("utf-8") for path in pdf_paths]
contents = [
    {
        "role": "user",
        "parts": [{"inline_data": {"mime_type": "application/pdf", "data": content}}],
    }
    for content in encoded_pdfs
]

cache = genai.caching.CachedContent.create(model="gemini-1.5-flash-001", contents=contents)

print("Cache created:", cache)
Full error stack trace
Traceback (most recent call last):
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 277, in __call__
    response, ignored_call = self._with_call(
                             ^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 332, in _with_call
    return call.result(), call
           ^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 440, in result
    raise self
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_interceptor.py", line 315, in continuation
    response, call = self._thunk(new_method).with_call(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 1198, in with_call
    return _end_unary_response_blocking(state, call, True, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "The service is currently unavailable."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:216.58.212.202:443 {created_time:"2025-01-29T16:49:05.933467+02:00", grpc_status:14, grpc_message:"The service is currently unavailable."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/adam/Documents/1 Projects/Landstack/document_summary/utils/pdfs.py", line 163, in <module>
    cache = genai.caching.CachedContent.create(model="gemini-1.5-flash-001", contents=contents)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/generativeai/caching.py", line 219, in create
    response = client.create_cached_content(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 883, in create_cached_content
    response = rpc(
               ^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adam/Library/Caches/pypoetry/virtualenvs/document-summary-pgPpAF9N-py3.12/lib/python3.12/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.

@codeananda
Copy link

Update: switching to VertexAI did the trick for me!

  1. Install libraries: pip install vertexai langchain_google_vertexai
  2. Set .env vars
  3. Chunk PDFs below 50MB
  4. Upload to GCS and get the uris
  5. Cache using the vertexai implementation

.env contents

VERTEX_PROJECT_ID=xxxxxx
VERTEX_LOCATION=xxxxxx
VERTEX_CREDENTIAL_PATH=path_to_credentials.json
GOOGLE_APPLICATION_CREDENTIALS=path_to_credentials.json

I've also found that VertexAI 1.5 Pro model can handle many more tokens and requests per minute than AI Studio. Had no retry errors with the former and endless issues with the latter.

from pathlib import Path
from google.cloud import storage
from google.cloud.exceptions import NotFound
from vertexai.generative_models import Part, Content
from vertexai.preview import caching
from langchain_google_vertexai import ChatVertexAI
from dotenv import load_dotenv

load_dotenv()

llm = ChatVertexAI(model_name="gemini-1.5-pro-002")

pdf_paths = [Path("1.pdf"), Path("2.pdf"), ...]

gcs_uris = [
    upload_pdf_to_gcs(pdf_path, 'bucket-name') for pdf_path in pdf_paths
]

parts = [Part.from_uri(uri=uri, mime_type="application/pdf") for uri in gcs_uris]
contents = [Content(role="user", parts=parts)]

cached_pdfs = caching.CachedContent.create(
    model_name=llm.model_name, contents=contents
)

llm.cached_content = cached_pdfs.name
upload_pdf_to_gcs definition
def upload_pdf_to_gcs(
    pdf_path: str | Path,
    bucket_name: str,
    destination_blob_name: str | None = None,
    create_bucket: bool = True,
) -> str:
    """
    Upload a PDF file to Google Cloud Storage and return its URI.

    Parameters
    ----------
    pdf_path : str | Path
        Local path to the PDF file
    bucket_name : str
        Name of the GCS bucket to upload to
    destination_blob_name : str | None
        Name to give the file in GCS. If None, uses the original filename
    create_bucket : bool, default False
        If True, creates the bucket if it doesn't exist
    """
    pdf_path = Path(pdf_path)

    if not destination_blob_name:
        destination_blob_name = str(pdf_path)

    client = storage.Client()

    try:
        bucket = client.get_bucket(bucket_name)
    except NotFound:
        if create_bucket:
            bucket = client.create_bucket(bucket_name)
        else:
            raise

    blob = bucket.blob(destination_blob_name)
    blob.upload_from_filename(str(pdf_path))

    file_uri = f"gs://{bucket_name}/{destination_blob_name}"
    return file_uri

@ctg5
Copy link

ctg5 commented Apr 29, 2025

For anyone finding this thread in mid-2025 and wondering if this is still an issue, the answer: Yes. I'm getting many, many 503 errors when uploading two PDFs of ~100 pages each. Tried increasing standoff retry logic. Didn't help. Will be moving to Vertex AI.

@sebastian-305
Copy link

Still an issue for me. Today, I was able to create one context (yesterday none) with the demo-file from the docs. But not anymore... no usefull error message, just service not available. When I make the context to small, it gives my my token count and the min count, when I make the context to big, it gives me my token count and the max count. When the context is just right, the service is down....

@ovicrisan
Copy link

ovicrisan commented Apr 29, 2025

You can see the status of the issue: "Open". So nothing fixed yet. I'm using it with REST API from .NET and have the same issue for larger files. It seems it's a hardcoded 80 seconds timeout for all caching creation.

@devagarwal007
Copy link

Facing same issue

@codeananda
Copy link

Switching to Vertex AI will help, see my above answer for python implementation

@kmatthews812
Copy link

Switching to Vertex worked for me as well. Here is a Ruby implementation that avoids using the google-cloud-ai_platform gem, which I found to be out of date and doesn't really add much value anyway. Hope it helps.

require 'google/cloud/storage'
require 'googleauth'

class VertexAdapter
  PROJECT_ID = ENV['VERTEX_PROJECT_ID']
  LOCATION = ENV['VERTEX_LOCATION']
  MODEL_NAME = "gemini-2.5-pro-preview-03-25"
  BUCKET_NAME = ENV["GOOGLE_CLOUD_STORAGE_BUCKET"]
  BASE_URL = "https://#{LOCATION}-aiplatform.googleapis.com"
  PARENT = "projects/#{PROJECT_ID}/locations/#{LOCATION}"  
  HTTP_TIMEOUT = 480
  CACHE_TTL = "1800s"

  def initialize
    scope = "https://www.googleapis.com/auth/cloud-platform"
    authorization = Google::Auth.get_application_default(scope)
    token = authorization.fetch_access_token!["access_token"]
    @api_headers = { "Content-Type" => "application/json", "Authorization" => "Bearer #{token}" }
  end

  def upload_file(pdf_content, s3_key)
    Rails.logger.info("Uploading to Vertex (Google Cloud Storage) API")
    
    storage = Google::Cloud::Storage.new(
      project_id: ENV["VERTEX_PROJECT_ID"]
    )

    # Create a valid bucket name following GCS naming conventions
    bucket = storage.bucket(BUCKET_NAME) || storage.create_bucket(BUCKET_NAME)

    # Upload the file with a path that includes the upload job ID
    file = bucket.create_file(
      StringIO.new(pdf_content),
      s3_key,
      content_type: 'application/pdf'
    )

    file.public_url
  end

  def check_cache(cache_id)
    Rails.logger.info("Checking cache for #{cache_id}")
    begin
      endpoint = "#{BASE_URL}/v1/#{PARENT}/cachedContents/#{cache_id}"
      Rails.logger.info("Checking cache for #{endpoint}")
      response = HTTParty.get(endpoint, headers: @api_headers)
      if response.code == 200
        Rails.logger.info("Cache found for #{cache_id}")
        true
      else
        Rails.logger.info("Cache not found for #{cache_id}")
        false
      end
    rescue => e
      # On network or server error, handle the same as a cache miss
      Rails.logger.error("Cache lookup error: #{e.message}")
      false
    end
  end

  def delete_cache(cache_id)
    Rails.logger.info("Deleting cache for #{cache_id}")
    begin
      endpoint = "#{BASE_URL}/v1/#{PARENT}/cachedContents/#{cache_id}"
      response = HTTParty.delete(endpoint, headers: @api_headers)
      if response.code == 200
        Rails.logger.info("Cache deleted for #{cache_id}")
        true
      else
        false
      end
    rescue => e
      # On network or server error, handle the same as a cache miss
      Rails.logger.error("Cache delete error: #{e.message}")
      false
    end
  end

  def cache_file(s3_key)
    system_prompt = File.read(Rails.root.join("lib", "instructions", "system_prompt.md"))
    body = {
      "model": "projects/#{PROJECT_ID}/locations/#{LOCATION}/publishers/google/models/#{MODEL_NAME}",
      "contents":[
        {
          "parts":[
            {"file_data": {"mime_type": "application/pdf", "file_uri": "gs://#{BUCKET_NAME}/#{s3_key}"}},
          ],
          "role": "user"
        }
      ],
      "systemInstruction": {
        "parts": [
          {
            "text": system_prompt
          }
        ],
        "role": "system"
      },
      "ttl": CACHE_TTL
    }
    
    endpoint = "#{BASE_URL}/v1/#{PARENT}/cachedContents"
    response = HTTParty.post(
      endpoint,
      headers: @api_headers,
      body: body.to_json,
      timeout: HTTP_TIMEOUT
    )

    if response.code == 200
      cached_name = JSON.parse(response.body)["name"]
      Rails.logger.debug("Successfully cached as #{cached_name}")
      cached_name
    else
      Rails.logger.error("API Error (#{response.code}): #{response.body}")
      raise RuntimeError, "API Error (#{response.code}): #{response.body}"
    end
  end

  def generate_content(cached_name)
    Rails.logger.info("Generating content from cached context")

    user_prompt = File.read(Rails.root.join("lib", "instructions", "user_prompt.md"))
    
    body = {
      "contents": [
        {
          "role": "user",
          "parts": [
            {
              "text": user_prompt,
            },
          ]
        },
      ],
      "cachedContent": cached_name,
      "generationConfig": {
        "responseMimeType": "text/plain",
      },
    }

    endpoint = "#{BASE_URL}/v1/#{PARENT}/publishers/google/models/#{MODEL_NAME}:generateContent"
    response = HTTParty.post(
      endpoint,
      headers: @api_headers,
      body: body.to_json,
      timeout: HTTP_TIMEOUT
    )

    if response.code == 200
      response_json = JSON.parse(response.body)
      response_json.dig("candidates", 0, "content", "parts", 0, "text")
    else
      Rails.logger.error("API Error (#{response.code}): #{response.body}")
      raise RuntimeError, "API Error (#{response.code}): #{response.body}"
    end
  end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:other Questions unrelated to SDK status:triaged Issue/PR triaged to the corresponding sub-team type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests