Streamlining Subpasses - Khronos Blog

Streamlining Subpasses - Khronos Blog

Streamlining Subpasses


Just over two years ago, the Khronos Vulkan® Working Group introduced the VK_KHR_dynamic_rendering extension in a blog titled “Streamlining Render Passes”. This extension allowed developers to bypass the creation of the complex render pass and framebuffer objects that had been in Vulkan since version 1.0, significantly reducing the complexity required to start rendering in Vulkan.

While VK_KHR_dynamic_rendering solved several problems with rendering, any developer using input attachments or subpasses would not be able to port to the new API - no equivalent functionality existed when using dynamic rendering.

Today we’re happy to announce the release of VK_KHR_dynamic_rendering_local_read, which adds support for local dependencies to dynamic rendering; enabling developers to fully move over to dynamic rendering as support is rolled out.

Local Reads

The primary functionality exposed in this extension is the ability to execute pipeline barriers using VK_DEPENDENCY_BY_REGION_BIT inside a dynamic render pass, allowing framebuffer-local dependencies between draw calls on either side of the barrier. These dependencies can be between rendering attachments and input attachments (as in the original render pass API), or additionally between storage resources or pointer accesses for greater flexibility.

Illustration of the stages of a deferred renderer, all within a single dynamic render passCredit: Sascha Willems (https://www.saschawillems.de), from the Vulkan Samples Repository

Local reads can be used for techniques such as order independent transparency, deferred rendering, or simple post-processing techniques that do not rely on neighboring values. A sample on how to make use of this should be available in the Vulkan samples repository soon.

Mapping to Subpasses

For newly written applications, using this extension is relatively straightforward — there’s a new layout for attachments to use, and if you want to use input attachments then they’ll need to have descriptors created for them as before — but there’s no other setup required. If you want to read a storage image with a local dependency, just insert a barrier with the “by region” flag and you’re good to read it in the following fragment shaders:

VkMemoryBarrier2 memoryBarrier = {
    .sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER_2,
    .srcStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT,
    .dstStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT,
    .srcAccessMask = VK_ACCESS_2_SHADER_WRITE_BIT,
    .dstAccessMask = VK_ACCESS_2_SHADER_READ_BIT };

VkDependencyInfo dependencyInfo = {
    .sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO,
    .dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT,
    .memoryBarrierCount = 1,
    .pMemoryBarriers = &memoryBarrier };

vkCmdPipelineBarrier2(commandBuffer, &dependencyInfo);

For applications looking to port content from render pass objects, we’ve added a few extra bits to the API to map your existing shader code bases to this new API. Color attachments can be remapped to different indices between pipelines to allow emulation of switching subpasses, and input attachment indices can be remapped to different color, depth, or stencil attachments, or directly to bound descriptors.

typedef struct VkRenderingAttachmentLocationInfoKHR {
    VkStructureType                 sType;
    const void*                     pNext;
    const uint32_t                  colorAttachmentCount;
    const uint32_t*                 pColorAttachmentLocations;
} VkRenderingAttachmentLocationInfoKHR;

void vkCmdSetRenderingAttachmentLocationsKHR(
    VkCommandBuffer                             commandBuffer,
    const VkRenderingAttachmentLocationInfoKHR* pLocationInfo);

typedef struct VkRenderingInputAttachmentIndexInfoKHR {
    VkStructureType                 sType;
    const void*                     pNext;
    const uint32_t                  colorAttachmentCount;
    const uint32_t*                 pColorAttachmentInputIndices;
    uint32_t                        depthInputAttachmentIndex;
    uint32_t                        stencilInputAttachmentIndex;
} VkRenderingInputAttachmentIndexInfoKHR;

void vkCmdSetRenderingInputAttachmentIndicesKHR(
    VkCommandBuffer                                 commandBuffer,
    const VkRenderingInputAttachmentIndexInfoKHR*   pInputAttachmentIndexInfo);

Notably, this setup for mapping to the previous API can be entirely ignored for new applications as the mappings default to the API array indices, but it is available for any who want to make use of it. Applications porting existing shader bases that don’t make use of subpasses will likely also find they can make do without specifying the mappings.

Staying On-Chip

Render pass objects were designed to be more expressive than implementations were necessarily able to accelerate, leaving developers unsure when or whether they would keep data in tile buffers between subpasses or not. With this extension, the API has been designed so that use cases which would require vendors to split render passes are not expressible within a single dynamic render pass (though are still expressible by using multiple passes), reducing the possibility of performance cliffs from falling off chip.

Conclusion

This extension will be available as part of the Vulkan Roadmap 2024 milestone, which requires that new high-end devices from each vendor will support it. However, we expect this to be more widely available — there are no specific hardware requirements beyond those needed for Vulkan 1.0, so platforms with regular driver updates should see rollout of this extension over the next year for a large variety of hardware.

 

The extension proposal for VK_KHR_dynamic_rendering_local_read goes into significantly more detail about how this extension can be used, and is a good place to look for more information for anyone looking to use it.