Skip to content

Implement TTL for cached datasource instances #1230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

iwysiu
Copy link
Contributor

@iwysiu iwysiu commented Feb 12, 2025

What this PR does / why we need it:
Investigation for implementing caching for datasource instances. It's pretty simple and straightforward. I did this using a popular caching library that we already use in core grafana.

There are a couple minor implementation notes/questions:

  • The linked issue mentions expiring after a certain amount of inactivity, instead this pr automatically expires instances after a set amount of time. I decided to do this because tracking the activity would require us to take a write lock every time we get an instance so that we could update it and would require us to write our own janitor processes.
  • Both the InstanceManager consumers in this repo generate string keys, but updating the interface to expect one is technically a breaking change, so I didn't make it. We could make a new InstanceManagerWithExpiration type if we wanted to avoid the sprintf
  • Would we want this behavior behind a feature flag?

Which issue(s) this PR fixes:

Fixes https://github.com/grafana/data-sources/issues/329

Special notes for your reviewer:
Opening this pr for review to get it in front of eyes that may have opinions about the implementation questions

@iwysiu iwysiu changed the title WIP: Implementing TTL for cached datasource instances Implement TTL for cached datasource instances Apr 17, 2025
@iwysiu iwysiu marked this pull request as ready for review April 17, 2025 18:34
@iwysiu iwysiu requested a review from a team as a code owner April 17, 2025 18:34
@iwysiu iwysiu requested review from andresmgot, oshirohugo and briangann and removed request for a team April 17, 2025 18:34
@wbrowne wbrowne self-requested a review April 18, 2025 09:25
Copy link
Contributor

@beejeebus beejeebus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some small suggestions, but they aren't blockers to this PR.

Comment on lines +117 to +120
strKey := fmt.Sprintf("%v", cacheKey)
im.locker.RLock(strKey)
item, ok := im.cache.Get(strKey)
im.locker.RUnlock(strKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the shared read locking here looks unnecessary. even though your PR didn't introduce it, I'd be in favour of removing it.

}

instance, err := im.provider.NewInstance(ctx, pluginContext)
if err != nil {
return nil, err
}
im.cache.Store(cacheKey, CachedInstance{
im.cache.Set(strKey, CachedInstance{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there's a cache.SetDefault() method for the 'set this and use the default expiration' case.

} else {
activeInstances.Dec()
}
im.cache.Delete(strKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a little faster to avoid this delete, make a function for the disposer code, and call that function from here and in the OnEvicted handler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the added complexity probably isn't worth it here - Delete here does acquire a lock, but we'll only hit this code path when the datasource config or Grafana config are updated, so quite rarely in the scheme of things.

Copy link
Contributor

@njvrzm njvrzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but I do think it would be better if we can implement this as "expire after idle time" rather than "expire after creation". This would let us set a much shorter expiration time, maybe an hour or fifteen minutes even, and make the whole thing much more effective (and a tiiiny bit less disruptive).

Fortunately that looks pretty straightforward! I think all we need to do to is to add

im.cache.SetDefault(strKey, ci)

before the returns on lines 127 and 141. This would reset the expiration for the item every time it's retrieved. There's no harm here if multiple threads do this within a short period of time, as they'll all be accomplishing essentially the same thing, so there's no need for extra locking.

}

tip := &testInstanceProvider{}
im := NewWithOptions(tip, time.Millisecond, 2*time.Millisecond, defaultDisposeTTL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, this test will take 5 seconds - should we use a smaller dispose TTL here?

} else {
activeInstances.Dec()
}
im.cache.Delete(strKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the added complexity probably isn't worth it here - Delete here does acquire a lock, but we'll only hit this code path when the datasource config or Grafana config are updated, so quite rarely in the scheme of things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants