Skip to content

Cuda interop vk13 #637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 64 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
d293b9b
create exportable buffers to import into cuda
atkurtul Jul 8, 2023
f5f1017
add missing cuda fn and update submodule
atkurtul Jul 9, 2023
6689b33
add missing cuda export functions
atkurtul Jul 9, 2023
9ade1c6
move boilerplates to CCUDADevice
atkurtul Jul 9, 2023
bfa7afc
correct chained cleanup desctruction order
atkurtul Jul 15, 2023
ddb861e
add safety checks
atkurtul Jul 15, 2023
f380398
semaphore interop
atkurtul Jul 15, 2023
2f7b517
get cuda interop working in vulkan_1_3 branch
atkurtul Jul 15, 2023
bd32f36
point jitify to the right hash
atkurtul Jan 4, 2024
b1c5a46
update examples && use non KHR version of vk functions
atkurtul Jan 4, 2024
0d36581
correct bad validations, KHR instead of coe func usage etc.
devshgraphicsprogramming Jan 4, 2024
725a984
revert a dangerous api change
devshgraphicsprogramming Jan 4, 2024
d2c9382
update examples_tests
devshgraphicsprogramming Jan 4, 2024
2d24604
Disabled CSPIRVIntrospector
Przemog1 Jan 5, 2024
2114e50
small fixes
Przemog1 Jan 5, 2024
f6320ce
remove unused cruft
devshgraphicsprogramming Jan 6, 2024
f749ab8
draft
devshgraphicsprogramming Jan 7, 2024
ad1e6ff
move the TimelineEventHandlers to their own header, simplifying every…
devshgraphicsprogramming Jan 8, 2024
a1afcc8
Made the TimelineEventHandlerST use a const ISemaphore, almost all of…
devshgraphicsprogramming Jan 8, 2024
262281f
implement MultiTimelineEventHandlerST and correct TimelineEventHandlerST
devshgraphicsprogramming Jan 8, 2024
d7690be
fix KHR function loading bugs
devshgraphicsprogramming Jan 8, 2024
13ff02a
fix some nasty bug in TimelineEventHandlerST
devshgraphicsprogramming Jan 8, 2024
fabc999
Take the TimelineEventHandlerST for a first spin with ICommandPoolCache
devshgraphicsprogramming Jan 8, 2024
0eb8e9a
turns out its quite easy to port the other utilities to the new Multi…
devshgraphicsprogramming Jan 8, 2024
e59408d
remove more unused stuff
devshgraphicsprogramming Jan 8, 2024
3f41a81
fix one liner huge bug
devshgraphicsprogramming Jan 8, 2024
fb1f50d
fix a smal bug and introduce a base class for TimelineEventHandler, a…
devshgraphicsprogramming Jan 9, 2024
94ee680
fix one more KHR function pointer bug and remove unused class
devshgraphicsprogramming Jan 9, 2024
c761d42
bring back bits of IUtilities needed for ex 05
devshgraphicsprogramming Jan 9, 2024
04689b9
device cap traits
atkurtul Dec 5, 2023
4a17eaf
port macros to boost pp
atkurtul Dec 5, 2023
5fcad02
has_member_x_with_type
atkurtul Dec 5, 2023
3c97ef1
make e_member_presence bitflags
atkurtul Dec 5, 2023
06b43af
Use new inline SPIR-V builtin syntax from DXC
devshgraphicsprogramming Jan 10, 2024
fd73e28
const correctness on surface capabilities
devshgraphicsprogramming Jan 12, 2024
153dd21
3D Blit test case was failing because of unimplemented functions for …
devshgraphicsprogramming Jan 12, 2024
bc7e24d
Make the SPhysicalDeviceFilter use spans for requirement arrays.
devshgraphicsprogramming Jan 12, 2024
b234d3b
ok so I found out that renderdoc hates External memory
devshgraphicsprogramming Jan 12, 2024
b5a633a
fix typos causing issues
devshgraphicsprogramming Jan 12, 2024
2ab33ed
API draft
devshgraphicsprogramming Jan 12, 2024
bbc5aa9
think about the other 3 utility functions
devshgraphicsprogramming Jan 12, 2024
d41f279
design clearing up
devshgraphicsprogramming Jan 12, 2024
04d05da
Ok we're done here with the Streaming Buffer upload port (removed the…
devshgraphicsprogramming Jan 12, 2024
3d034c5
move the SIntendedSubmitInfo struct out of IUtilities
devshgraphicsprogramming Jan 12, 2024
3160a46
going to sleep, next TODO is to implement the IUtilities::downloadBuf…
devshgraphicsprogramming Jan 12, 2024
8670d42
outline the TODO for @theoreticalphysicsftw
devshgraphicsprogramming Jan 13, 2024
2d86373
fix debugmessenger not being created
atkurtul Jan 13, 2024
ca2593c
fix a validation error
devshgraphicsprogramming Jan 13, 2024
461cb4a
rework pipeline barriers and events to use std::spans
devshgraphicsprogramming Jan 13, 2024
d96fd1d
Port `downloadBufferRangeViaStagingBuffer
devshgraphicsprogramming Jan 13, 2024
2d2acc9
fix bug in CRAIISpanPatch
devshgraphicsprogramming Jan 13, 2024
60c1c39
Ported Example 23, and fixed a few bugs here and there
devshgraphicsprogramming Jan 14, 2024
3faf1fb
merge conflicts
atkurtul Jan 13, 2024
fd4f733
add missing external resource property queries
atkurtul Jan 14, 2024
5b1940c
add more stuff
atkurtul Jan 14, 2024
7074256
Merge branch 'vulkan_1_3' into cuda-interop-vk13
atkurtul Jan 14, 2024
6449b2f
Merge branch 'vulkan_1_3' into cuda-interop-vk13
atkurtul Jan 18, 2024
3d9a530
address pr comments
atkurtul Jan 18, 2024
4d174e5
last commit part 2
atkurtul Jan 18, 2024
cbd18f4
add missing cuda fn & map queue indices to vk
atkurtul Jan 18, 2024
23fe8d4
update submodule
atkurtul Jan 18, 2024
c32fd79
cache cuda devices
atkurtul Jan 18, 2024
4e2185c
ifdef platform code
atkurtul Jan 19, 2024
bd0b76a
log queue validation warning
atkurtul Jan 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples_tests
19 changes: 9 additions & 10 deletions include/nbl/video/CCUDADevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@ class CCUDADevice : public core::IReferenceCounted
static constexpr IDeviceMemoryBacked::E_EXTERNAL_HANDLE_TYPE EXTERNAL_MEMORY_HANDLE_TYPE = IDeviceMemoryBacked::EHT_OPAQUE_FD;
static constexpr CUmemAllocationHandleType ALLOCATION_TYPE = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR;
#endif
struct SCUDACleaner : video::ICleanup
{
core::smart_refctd_ptr<const core::IReferenceCounted> resource;
SCUDACleaner(core::smart_refctd_ptr<const core::IReferenceCounted> resource)
: resource(std::move(resource))
{ }
};

enum E_VIRTUAL_ARCHITECTURE
{
Expand Down Expand Up @@ -95,18 +102,10 @@ class CCUDADevice : public core::IReferenceCounted
protected:
CUresult reserveAdrressAndMapMemory(CUdeviceptr* outPtr, size_t size, size_t alignment, CUmemLocationType location, CUmemGenericAllocationHandle memory);


// CUDAHandler creates CUDADevice, it needs to access ctor
friend class CCUDAHandler;
friend class CCUDASharedMemory;
friend class CCUDASharedSemaphore;

struct SCUDACleaner : video::ICleanup
{
core::smart_refctd_ptr<const core::IReferenceCounted> resource;
SCUDACleaner(core::smart_refctd_ptr<const core::IReferenceCounted> resource)
: resource(std::move(resource))
{ }
};

CCUDADevice(core::smart_refctd_ptr<CVulkanConnection>&& _vulkanConnection, IPhysicalDevice* const _vulkanDevice, const E_VIRTUAL_ARCHITECTURE _virtualArchitecture, CUdevice _handle, core::smart_refctd_ptr<CCUDAHandler>&& _handler);
~CCUDADevice();

Expand Down
1 change: 1 addition & 0 deletions include/nbl/video/CCUDAHandler.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ class CCUDAHandler : public core::IReferenceCounted
,cuDestroyExternalSemaphore
,cuImportExternalSemaphore
,cuSignalExternalSemaphoresAsync
,cuWaitExternalSemaphoresAsync
);
const CUDA& getCUDAFunctionTable() const {return m_cuda;}

Expand Down
5 changes: 3 additions & 2 deletions include/nbl/video/CCUDASharedMemory.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ namespace nbl::video
class CCUDASharedMemory : public core::IReferenceCounted
{
public:
// required for us to see the move ctor
friend class CCUDADevice;

CUdeviceptr getDeviceptr() const { return m_params.ptr; }
Expand All @@ -49,11 +50,11 @@ class CCUDASharedMemory : public core::IReferenceCounted

core::smart_refctd_ptr<IDeviceMemoryAllocation> exportAsMemory(ILogicalDevice* device, IDeviceMemoryBacked* dedication = nullptr) const;

core::smart_refctd_ptr<IGPUImage> createAndBindImage(ILogicalDevice* device, IGPUImage::SCreationParams&& params) const;
core::smart_refctd_ptr<IGPUImage> createAndBindImage(ILogicalDevice* device, asset::IImage::SCreationParams&& params) const;

protected:

CCUDASharedMemory(core::smart_refctd_ptr<CCUDADevice> device, SCachedCreationParams&& params)
CCUDASharedMemory(core::smart_refctd_ptr<CCUDADevice>&& device, SCachedCreationParams&& params)
: m_device(std::move(device))
, m_params(std::move(params))
{}
Expand Down
27 changes: 17 additions & 10 deletions include/nbl/video/IDeviceMemoryAllocation.h
Original file line number Diff line number Diff line change
Expand Up @@ -164,14 +164,21 @@ class IDeviceMemoryAllocation : public virtual core::IReferenceCounted
//! Constant variant of getMappedPointer
inline const void* getMappedPointer() const { return m_mappedPtr; }

struct SCreationParams
struct SInfo
{
uint64_t allocationSize = 0;
core::bitflag<IDeviceMemoryAllocation::E_MEMORY_ALLOCATE_FLAGS> allocateFlags = IDeviceMemoryAllocation::EMAF_NONE;
// Handle Type for external resources
IDeviceMemoryAllocation::E_EXTERNAL_HANDLE_TYPE externalHandleType = IDeviceMemoryAllocation::EHT_NONE;
//! Imports the given handle if externalHandle != nullptr && externalHandleType != EHT_NONE
//! Creates exportable memory if externalHandle == nullptr && externalHandleType != EHT_NONE
void* externalHandle = nullptr;
};

struct SCreationParams: SInfo
{
core::bitflag<E_MEMORY_ALLOCATE_FLAGS> allocateFlags = E_MEMORY_ALLOCATE_FLAGS::EMAF_NONE;
core::bitflag<E_MEMORY_PROPERTY_FLAGS> memoryPropertyFlags = E_MEMORY_PROPERTY_FLAGS::EMPF_NONE;
E_EXTERNAL_HANDLE_TYPE externalHandleType = E_EXTERNAL_HANDLE_TYPE::EHT_NONE;
void* externalHandle = nullptr;
const bool dedicated = false;
const size_t allocationSize;
};

protected:
Expand All @@ -183,21 +190,21 @@ class IDeviceMemoryAllocation : public virtual core::IReferenceCounted
IDeviceMemoryAllocation(
const ILogicalDevice* originDevice, SCreationParams&& params = {})
: m_originDevice(originDevice)
, m_params(std::move(params))
, m_mappedPtr(nullptr)
, m_mappedRange{ 0, 0 }
, m_currentMappingAccess(EMCAF_NO_MAPPING_ACCESS)
, m_params(std::move(params))
{}

virtual void* map_impl(const MemoryRange& range, const core::bitflag<E_MAPPING_CPU_ACCESS_FLAGS> accessHint) = 0;
virtual bool unmap_impl() = 0;


const ILogicalDevice* m_originDevice = nullptr;
uint8_t* m_mappedPtr;
MemoryRange m_mappedRange;
core::bitflag<E_MAPPING_CPU_ACCESS_FLAGS> m_currentMappingAccess;
SCreationParams m_params;
SCreationParams m_params = {};
uint8_t* m_mappedPtr = nullptr;
MemoryRange m_mappedRange = {};
core::bitflag<E_MAPPING_CPU_ACCESS_FLAGS> m_currentMappingAccess = EMCAF_NO_MAPPING_ACCESS;
std::unique_ptr<struct ICleanup> m_postDestroyCleanup = nullptr;
};

Expand Down
16 changes: 4 additions & 12 deletions include/nbl/video/IDeviceMemoryAllocator.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,11 @@ namespace nbl::video
class IDeviceMemoryAllocator
{
public:
struct SAllocateInfo
struct SAllocateInfo: IDeviceMemoryAllocation::SInfo
{
size_t size : 54 = 0ull;
size_t flags : 5 = 0u; // IDeviceMemoryAllocation::E_MEMORY_ALLOCATE_FLAGS
size_t memoryTypeIndex : 5 = 0u;
uint32_t memoryTypeIndex = 0u;
IDeviceMemoryBacked* dedication = nullptr; // if you make the info have a `dedication` the memory will be bound right away, also it will use VK_KHR_dedicated_allocation on vulkan
// size_t opaqueCaptureAddress = 0u; Note that this mechanism is intended only to support capture/replay tools, and is not recommended for use in other applications.

// Handle Type for external resources
IDeviceMemoryAllocation::E_EXTERNAL_HANDLE_TYPE externalHandleType = IDeviceMemoryAllocation::EHT_NONE;
//! Imports the given handle if externalHandle != nullptr && externalHandleType != EHT_NONE
//! Creates exportable memory if externalHandle == nullptr && externalHandleType != EHT_NONE
void* externalHandle = nullptr;
};

//! IMemoryTypeIterator extracts memoryType indices from memoryTypeBits in arbitrary order
Expand Down Expand Up @@ -54,8 +46,8 @@ class IDeviceMemoryAllocator
inline SAllocateInfo operator()(IDeviceMemoryBacked* dedication)
{
SAllocateInfo ret = {};
ret.size = m_reqs.size;
ret.flags = m_allocateFlags;
ret.allocationSize = m_reqs.size;
ret.allocateFlags = core::bitflag<IDeviceMemoryAllocation::E_MEMORY_ALLOCATE_FLAGS>(m_allocateFlags);
ret.memoryTypeIndex = dereference();
ret.dedication = dedication;
ret.externalHandleType = m_handleType;
Expand Down
2 changes: 1 addition & 1 deletion include/nbl/video/IDeviceMemoryBacked.h
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ class IDeviceMemoryBacked : public IBackendObject

//! members
SCachedCreationParams m_cachedCreationParams;
SDeviceMemoryRequirements m_cachedMemoryReqs;
const SDeviceMemoryRequirements m_cachedMemoryReqs;
void* m_cachedExternalHandle = nullptr;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what even sets this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #637 (comment), no need for extra member

};

Expand Down
31 changes: 8 additions & 23 deletions include/nbl/video/ILogicalDevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ class NBL_API2 ILogicalDevice : public core::IReferenceCounted, public IDeviceMe
virtual IQueue::RESULT waitIdle() const = 0;

//! Semaphore Stuff
virtual core::smart_refctd_ptr<ISemaphore> createSemaphore(ISemaphore::SCreationParams&&) = 0;
virtual core::smart_refctd_ptr<ISemaphore> createSemaphore(uint64_t initialValue = 0, ISemaphore::SCreationParams&& = {}) = 0;
virtual ISemaphore::WAIT_RESULT waitForSemaphores(const std::span<const ISemaphore::SWaitInfo> infos, const bool waitAll, const uint64_t timeout) = 0;
// Forever waiting variant if you're confident that the fence will eventually be signalled
inline ISemaphore::WAIT_RESULT blockForSemaphores(const std::span<const ISemaphore::SWaitInfo> infos, const bool waitAll=true)
Expand Down Expand Up @@ -285,29 +285,14 @@ class NBL_API2 ILogicalDevice : public core::IReferenceCounted, public IDeviceMe

//! Descriptor Creation
// Buffer (@see ICPUBuffer)
inline core::smart_refctd_ptr<IGPUBuffer> createBuffer(IGPUBuffer::SCreationParams&& creationParams)
{
const auto maxSize = getPhysicalDeviceLimits().maxBufferSize;
if (creationParams.size>maxSize)
{
m_logger.log("Failed to create Buffer, size %d larger than Device %p's limit!",system::ILogger::ELL_ERROR,creationParams.size,this,maxSize);
return nullptr;
}
return createBuffer_impl(std::move(creationParams));
}
core::smart_refctd_ptr<IGPUBuffer> createBuffer(IGPUBuffer::SCreationParams&& creationParams);

// Create a BufferView, to a shader; a fake 1D-like texture with no interpolation (@see ICPUBufferView)
core::smart_refctd_ptr<IGPUBufferView> createBufferView(const asset::SBufferRange<const IGPUBuffer>& underlying, const asset::E_FORMAT _fmt);

// Creates an Image (@see ICPUImage)
inline core::smart_refctd_ptr<IGPUImage> createImage(IGPUImage::SCreationParams&& creationParams)
{
if (!IGPUImage::validateCreationParameters(creationParams))
{
m_logger.log("Failed to create Image, invalid creation parameters!",system::ILogger::ELL_ERROR);
return nullptr;
}
// TODO: @Cyprian validation of creationParams against the device's limits (sample counts, etc.) see vkCreateImage
return createImage_impl(std::move(creationParams));
}
core::smart_refctd_ptr<IGPUImage> createImage(IGPUImage::SCreationParams&& params);

// Create an ImageView that can actually be used by shaders (@see ICPUImageView)
inline core::smart_refctd_ptr<IGPUImageView> createImageView(IGPUImageView::SCreationParams&& params)
{
Expand Down Expand Up @@ -765,9 +750,9 @@ class NBL_API2 ILogicalDevice : public core::IReferenceCounted, public IDeviceMe
virtual bool bindBufferMemory_impl(const uint32_t count, const SBindBufferMemoryInfo* pInfos) = 0;
virtual bool bindImageMemory_impl(const uint32_t count, const SBindImageMemoryInfo* pInfos) = 0;

virtual core::smart_refctd_ptr<IGPUBuffer> createBuffer_impl(IGPUBuffer::SCreationParams&& creationParams) = 0;
virtual core::smart_refctd_ptr<IGPUBuffer> createBuffer_impl(IGPUBuffer::SCreationParams&& creationParams, bool dedicatedOnly = false) = 0;
virtual core::smart_refctd_ptr<IGPUBufferView> createBufferView_impl(const asset::SBufferRange<const IGPUBuffer>& underlying, const asset::E_FORMAT _fmt) = 0;
virtual core::smart_refctd_ptr<IGPUImage> createImage_impl(IGPUImage::SCreationParams&& params) = 0;
virtual core::smart_refctd_ptr<IGPUImage> createImage_impl(IGPUImage::SCreationParams&& params, bool dedicatedOnly = false) = 0;
virtual core::smart_refctd_ptr<IGPUImageView> createImageView_impl(IGPUImageView::SCreationParams&& params) = 0;
virtual core::smart_refctd_ptr<IGPUBottomLevelAccelerationStructure> createBottomLevelAccelerationStructure_impl(IGPUAccelerationStructure::SCreationParams&& params) = 0;
virtual core::smart_refctd_ptr<IGPUTopLevelAccelerationStructure> createTopLevelAccelerationStructure_impl(IGPUTopLevelAccelerationStructure::SCreationParams&& params) = 0;
Expand Down
4 changes: 1 addition & 3 deletions include/nbl/video/ISemaphore.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,6 @@ class ISemaphore : public IBackendObject
//! Imports the given handle if externalHandle != nullptr && externalMemoryHandleType != EHT_NONE
//! Creates exportable memory if externalHandle == nullptr && externalMemoryHandleType != EHT_NONE
void* externalHandle = nullptr;

uint64_t initialValue = 0;
};

auto const& getCreationParams() const
Expand All @@ -85,7 +83,7 @@ class ISemaphore : public IBackendObject
{}
virtual ~ISemaphore() = default;

SCreationParams m_creationParams;
const SCreationParams m_creationParams;
};

}
Expand Down
4 changes: 0 additions & 4 deletions include/nbl/video/SPhysicalDeviceLimits.h
Original file line number Diff line number Diff line change
Expand Up @@ -552,10 +552,6 @@ struct SPhysicalDeviceLimits
/* CooperativeMatrixPropertiesKHR *//* VK_KHR_cooperative_matrix */
core::bitflag<asset::IShader::E_SHADER_STAGE> cooperativeMatrixSupportedStages = asset::IShader::ESS_UNKNOWN;

bool externalFenceWin32 = false; /* VK_KHR_external_fence_win32 */ // [TODO] requires instance extensions, add them
bool externalMemoryWin32 = false; /* VK_KHR_external_memory_win32 */ // [TODO] requires instance extensions, add them
bool externalSemaphoreWin32 = false; /* VK_KHR_external_semaphore_win32 */ // [TODO] requires instance extensions, add them

/* Always enabled if available, reported as limits */

// Core 1.0 Features
Expand Down
2 changes: 1 addition & 1 deletion include/nbl/video/utilities/IUtilities.h
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ class NBL_API2 IUtilities : public core::IReferenceCounted
//! WARNING: This function blocks CPU and stalls the GPU!
inline bool autoSubmitAndBlock(const SIntendedSubmitInfo::SFrontHalf& submit, const std::function<bool(SIntendedSubmitInfo&)>& what)
{
auto semaphore = m_device->createSemaphore(ISemaphore::SCreationParams{.initialValue=0});
auto semaphore = m_device->createSemaphore();
// so we begin latching everything on the value of 1, but if we overflow it increases
IQueue::SSubmitInfo::SSemaphoreInfo info = {semaphore.get(),1};

Expand Down
2 changes: 1 addition & 1 deletion src/nbl/video/CCUDADevice.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ CUresult CCUDADevice::importGPUSemaphore(core::smart_refctd_ptr<CCUDASharedSemap

if (!handleType.hasFlags(ISemaphore::EHT_OPAQUE_WIN32) || !handle)
return CUDA_ERROR_INVALID_VALUE;

CUDA_EXTERNAL_SEMAPHORE_HANDLE_DESC desc = {
.type = CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_TIMELINE_SEMAPHORE_WIN32,
.handle = {.win32 = {.handle = handle }},
Expand Down
46 changes: 7 additions & 39 deletions src/nbl/video/CCUDASharedMemory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ namespace nbl::video
core::smart_refctd_ptr<IDeviceMemoryAllocation> CCUDASharedMemory::exportAsMemory(ILogicalDevice* device, IDeviceMemoryBacked* dedication) const
{
IDeviceMemoryAllocator::SAllocateInfo info = {
.size = m_params.granularSize,
.externalHandleType = CCUDADevice::EXTERNAL_MEMORY_HANDLE_TYPE,
.externalHandle = m_params.osHandle,
{
.allocationSize = m_params.granularSize,
.externalHandleType = CCUDADevice::EXTERNAL_MEMORY_HANDLE_TYPE,
.externalHandle = m_params.osHandle,
}
};

auto pd = device->getPhysicalDevice();
Expand Down Expand Up @@ -43,46 +45,12 @@ core::smart_refctd_ptr<IDeviceMemoryAllocation> CCUDASharedMemory::exportAsMemor
std::make_unique<CCUDADevice::SCUDACleaner>(core::smart_refctd_ptr<const CCUDASharedMemory>(this))).memory;
}

#if 0
core::smart_refctd_ptr<IGPUBuffer> CCUDASharedMemory::exportAsBuffer(ILogicalDevice* device, core::bitflag<asset::IBuffer::E_USAGE_FLAGS> usage) const
core::smart_refctd_ptr<IGPUImage> CCUDASharedMemory::createAndBindImage(ILogicalDevice* device, asset::IImage::SCreationParams&& params) const
{
if (!device || !m_device->isMatchingDevice(device->getPhysicalDevice()))
return nullptr;

auto buf = device->createBuffer({{
.size = m_params.granularSize,
.usage = usage }, {{
.postDestroyCleanup = std::make_unique<CCUDADevice::SCUDACleaner>(core::smart_refctd_ptr<const CCUDASharedMemory>(this)),
.externalHandleTypes = CCUDADevice::EXTERNAL_MEMORY_HANDLE_TYPE,
.externalHandle = m_params.osHandle
}}});

auto req = buf->getMemoryReqs();
auto pd = device->getPhysicalDevice();
switch (m_params.location)
{
case CU_MEM_LOCATION_TYPE_DEVICE: req.memoryTypeBits &= pd->getDeviceLocalMemoryTypeBits(); break;
case CU_MEM_LOCATION_TYPE_HOST: req.memoryTypeBits &= pd->getHostVisibleMemoryTypeBits(); break;
// TODO(Atil): Figure out how to handle these
case CU_MEM_LOCATION_TYPE_HOST_NUMA:
case CU_MEM_LOCATION_TYPE_HOST_NUMA_CURRENT:
default: break;
}

if (!device->allocate(req, buf.get()).isValid())
return nullptr;

return buf;
}

#endif

core::smart_refctd_ptr<IGPUImage> CCUDASharedMemory::createAndBindImage(ILogicalDevice* device, IGPUImage::SCreationParams&& params) const
{
if (!device || !m_device->isMatchingDevice(device->getPhysicalDevice()))
return nullptr;

auto img = device->createImage(std::move(params));
auto img = device->createImage({ std::move(params), { {.externalHandleTypes = CCUDADevice::EXTERNAL_MEMORY_HANDLE_TYPE } }, IGPUImage::TILING::LINEAR });

if (exportAsMemory(device, img.get()))
return img;
Expand Down
28 changes: 22 additions & 6 deletions src/nbl/video/CVulkanCommandBuffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,25 +48,41 @@ void fill(vk_barrier_t& out, const ResourceBarrier& in, uint32_t selfQueueFamily
// https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkBufferMemoryBarrier2-buffer-04088
if (concurrentSharing)
selfQueueFamilyIndex = IQueue::FamilyIgnored;

auto mapQFIdx = [](uint32_t idx)
{
switch (idx)
{
case IQueue::FamilyExternal:
case IQueue::FamilyIgnored:
case IQueue::FamilyForeign:
idx |= 1u << 31;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda flaky, cause it relies on my encoding I have that I only use 1 bit for something else, I'd rather do a real switch-case-return (the VK enum)

break;
}
return idx;
};

if constexpr (!std::is_same_v<vk_barrier_t,VkMemoryBarrier2>)
{
out.srcQueueFamilyIndex = selfQueueFamilyIndex;
out.dstQueueFamilyIndex = selfQueueFamilyIndex;
out.srcQueueFamilyIndex = mapQFIdx(selfQueueFamilyIndex);
out.dstQueueFamilyIndex = mapQFIdx(selfQueueFamilyIndex);
}
const asset::SMemoryBarrier* memoryBarrier;
if constexpr (std::is_same_v<IGPUCommandBuffer::SOwnershipTransferBarrier,ResourceBarrier>)
{
memoryBarrier = &in.dep;
// in.otherQueueFamilyIndex==selfQueueFamilyIndex not resulting in ownership transfer is implicit
if (!concurrentSharing && in.otherQueueFamilyIndex!=IQueue::FamilyIgnored)
switch (in.ownershipOp)
if (!concurrentSharing && in.otherQueueFamilyIndex != IQueue::FamilyIgnored)
{
switch (in.ownershipOp)
{
case IGPUCommandBuffer::SOwnershipTransferBarrier::OWNERSHIP_OP::RELEASE:
out.dstQueueFamilyIndex = in.otherQueueFamilyIndex;
out.dstQueueFamilyIndex = mapQFIdx(in.otherQueueFamilyIndex);
break;
case IGPUCommandBuffer::SOwnershipTransferBarrier::OWNERSHIP_OP::ACQUIRE:
out.srcQueueFamilyIndex = in.otherQueueFamilyIndex;
out.srcQueueFamilyIndex = mapQFIdx(in.otherQueueFamilyIndex);
break;
}
}
}
else
Expand Down
1 change: 0 additions & 1 deletion src/nbl/video/CVulkanImage.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ namespace nbl::video
CVulkanImage::~CVulkanImage()
{
preDestroyStep();
// don't destroy imported handles
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new text

// e.g. don't destroy imported handles from the same VkInstance (e.g. if hooking into external Vulkan codebase)
// truly EXTERNAL_MEMORY imported handles, do need to be destroyed + CloseHandled (separate thing)

if (!m_cachedCreationParams.skipHandleDestroy)
{
const CVulkanLogicalDevice* vulkanDevice = static_cast<const CVulkanLogicalDevice*>(getOriginDevice());
Expand Down
Loading