-
Notifications
You must be signed in to change notification settings - Fork 35
add memory properties API #1263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I vote for option 5. |
Also we need example showing memorypropertyapi, with cache, so if user needs one then they can copy paste it from the example. |
@lplewa, the problem is that not all properties have 1:1 relation with the provider. For example, memory id is unique for the allocation and the provider may include several allocations. In this case we will have to use 3 different caches to get all the properties: umfPoolGetMemoryProvider, user cache props_cache[provider] and additional "cache" inside the provider to get mem_id and other allocation related properties (umfMemoryProviderGetAllocationID and etc). I am not sure that it will show good performance. But we may check. I also do not see what advantages option 5 provides if compared it with using native CUDA/Level-Zero API. We already can get all properties from CUDA/Level-Zero and then cache them. |
A few more properties, which are used in MPI: base pointer and size of the full allocation. We get it from zeMemGetAddressRange/cuMemGetAddressRange |
General thoughts from my side:
|
Yes, we are aware of per-allocation properties vs per-provider properties. The user could cache only per-provider properties if needed. Also, I don't think caching per-allocation properties makes sense (consider free + alloc the same pointer from the different provider). IMO, for per-allocation props we should just extend our existing API (umfMemoryProviderGetMinPageSize etc). So I would separate these problems and focus here only on per-provider properties.
|
But it is the main area where we can improve performance in MPI. MPI cannot cache memory id, because as you said it can be changed when free/alloc. But UMF controls free/alloc and can update the cache accordantly |
For mem ID UMF can introduce its own mem ID and store it as part of the tracker |
ok to be more precise: there is no sense in caching per-alloc properties at the user side. Of course, we could do it at the UMF side (using our allocation tracker), but accessing it would require a different set of API functions which accept an allocation ptr instead of provider handle as an argument. |
Under "testing" I meant that for every proposal, we should consider how it will be used by MPI. As the first step, we do not even need to create a working POC; just creating a simple code snippet (like you did to demonstrate idea with caching on the user side) should be enough. |
The next question is why not to make all Mem Property APIs accept a pointer as a parameter? Even though some property is provider-specific, internally we can call P.S.: I am not pushing to this approach, just trying to brainstorm and ask questions. |
Since currently we have at least 3 more per-allocation functions to consider (alloc ID, base ptr, and size) I also think that this could be a good approach. And we can't use CTL here (but we can still use it internally in the implementation). |
We may need two different types of API:
For both of them ideally we should call one search in the cache to achieve the best performance. |
One of the types may not be needed if we can achieve the same behavior with another type without overhead. |
We also need to consider a representative benchmarks. Compute benchmarks already contains something for Level Zero: We need to consider adding a UMF version or creating other representative benchmarks, so that we can monitor improvements vs L0 |
I would also define "the roofline". For example time of one search in the cache. We may choose some reasonable number of elements in the cache |
@vinser52, please comment on which of the proposed options you vote for (even considering per-allocation properties - the problem we face here is the same as with providers) or do you have an idea for a different proposal?
I would like to start with the most promising option. This would be a lot of work to create a POC for all options
yes, this would be helpful |
@irozanova What is exactly memory allocation id, and how do you use it? |
@lplewa, it is unique id, which CUDA/L0 returns for each allocation. We have some caches associated with pointer and we need to know if the pointer refers to the same memory or the memory was deallocated and new allocation has the same address. |
Do you need this information at level of Cuda/L0 - or you need it at umf level. As we can reuse memory without going back to the driver., so id at driver level will be the same, but this might be a different allocation. |
@lplewa , not necessary id from CUDA/L0, but if the memory was not returned to the driver, then we need the same id as before. New id would mean that the memory was returned to the driver and we need to remove outdated elements from the cache |
In UMF terms, each coarse-grain allocation from the memory provider has a unique memory id. Memory ID is used to detect the new allocation that has the same VA. |
Technically these are the caches we are considering to delegate to UMF going forward :) |
Cache consistency/invalidation is what we wanted to stop doing on higher level runtime in general. I.e. the more caching would happen on UMF, the better. Main scenario would be where UMF would intercept alloc/free so cache invalidation would happen inside some hook/interception layer. |
Uh oh!
There was an error while loading. Please reload this page.
UMF should offer a set of observability functions that can be used to retrieve the memory properties of memory allocated through UMF. Since these properties are closely tied to the provider used, the API should essentially return the provider's properties for a given pointer.
Requirements:
Currently, for a given
ptr
, a user can obtain a provider by calling:After obtaining the provider, there are several options for retrieving the memory provider properties:
Proposal 1 - per-provider get/set functions
In this proposal, the user needs to be aware of the provider's type. Each provider's property can then be retrieved in a manner similar to how it was set during creation. Additionally, there could be extra functions that do not have a corresponding "set" function for provider properties, such as a function that retrieves the general type of memory (e.g., CPU or GPU). If a specific provider doesn't know how to populate a given property, we could return a new error code
UMF_RESULT_INVALID_PARAM
.Proposal 2 - common properties structure
In this proposal, we could define a common structure for provider properties, along with a function that returns it based on the provider handle. In this structure, we could maintain some properties, such as
is_cpu_accessible
, in a common scope, while storing provider-specific properties in unions (they could be nested). Additionally, we could introduce the type of provider as one of the properties.Proposal 1 + 2 - generic per-property set of functions
This is a mix of proposals 1 and 2. The difference is that
umf_memory_provider_params_t
is hidden from the user, and there is a public list of generic (not provider-specific) per-property functions:Proposal 3A - per-provider key-value set based on strings
Each provider could keep a key-value store that could hold its properties. Then, the user could get the specific property from a provider using its name.
Proposal 3B - per-provider key-value set based on IDs
Similarly to string-based Get functions from Proposal 3A we could use a property ID. They could be pre-defined in public headers.
Proposal 4 - CTL
Similar to proposal 3A but based on CTL.
Additional Considerations - per allocation properties
Please note that in the proposals above, we assumed that all pointer properties could be derived from the provider properties. However, this is not the case for certain attributes, such as the unique ID of the allocation (see
CU_POINTER_ATTRIBUTE_BUFFER_ID
for CUDA andze_memory_allocation_properties_t.id
for Level Zero) or the page size. To query the page size of an allocation, the user could use the genericumfMemoryProviderGetMinPageSize(provider, ptr, &page_size)
function. However, we still need to define a newumfMemoryProviderGetAllocationID(provider, ptr, &id)
function for retrieving the allocation ID.Additional per-pointer properties to consider are base pointer and size of the full allocation (see zeMemGetAddressRange).
Hybrid proposal
It is also worth noticing, that we could achieve both flexibility (like in the Proposal 4 - CTL) and performance by caching per-provider properties at the user side:
Pros / Cons
per-provider get/set funcs
common props struct
generic per-property funcs
per-provider key-value strings set
per-provider key-value ID set
CTL
Hybrid CTL
The text was updated successfully, but these errors were encountered: