|
| 1 | +Issues in Metal RT Implementation |
| 2 | + |
| 3 | +### 1. **Hardcoded TLAB Layout (Critical)** |
| 4 | + |
| 5 | +```plume/plume_metal.cpp#L2926-2932 |
| 6 | +// Build the TLAB: array of GPU addresses for each root parameter. |
| 7 | +uint64_t tlab[3] = { |
| 8 | + descriptorSet->rtUAVTableBuffer ? descriptorSet->rtUAVTableBuffer->gpuAddress() : 0, |
| 9 | + descriptorSet->rtASHeaderBuffer ? descriptorSet->rtASHeaderBuffer->gpuAddress() : 0, |
| 10 | + cbvAddress |
| 11 | +}; |
| 12 | +``` |
| 13 | + |
| 14 | +The TLAB assumes a fixed 3-element layout `[UAV, AS, CBV]`. This only works for shaders with exactly this root signature. Any shader with a different layout (more textures, different order, SRV textures, etc.) will break silently. |
| 15 | + |
| 16 | +**Fix:** Either generate the TLAB based on the actual pipeline layout/descriptor set layout, or document this limitation clearly. |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +### 2. **Fragile Resource Discovery in `traceRays`** |
| 21 | + |
| 22 | +```plume/plume_metal.cpp#L2850-2872 |
| 23 | +for (size_t i = 0; i < descriptorSet->resourceEntries.size(); i++) { |
| 24 | + const auto& entry = descriptorSet->resourceEntries[i]; |
| 25 | + if (entry.resource == nullptr) continue; |
| 26 | + |
| 27 | + switch (entry.type) { |
| 28 | + case RenderDescriptorRangeType::READ_WRITE_TEXTURE: |
| 29 | + case RenderDescriptorRangeType::TEXTURE: |
| 30 | + outputTexture = static_cast<MTL::Texture*>(entry.resource); |
| 31 | + break; |
| 32 | + // ... |
| 33 | + } |
| 34 | +} |
| 35 | +``` |
| 36 | + |
| 37 | +This picks the **first** texture/AS/buffer it finds. If a shader has multiple textures, only the first one ends up ;in the TLAB. Related to issue #1. |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +### 3. **Dangling Pointer Risk in `instanceContributions`** |
| 42 | + |
| 43 | +```plume/plume_metal.cpp#L2117-2118 |
| 44 | +resourceEntries[descriptorIndex].instanceContributions = metalAS->instanceContributions.empty() |
| 45 | + ? nullptr : metalAS->instanceContributions.data(); |
| 46 | +``` |
| 47 | + |
| 48 | +This stores a raw pointer to the internal data of `MetalAccelerationStructure::instanceContributions`. If the AS is destroyed or the vector reallocates, this becomes dangling. |
| 49 | + |
| 50 | +**Fix:** Copy the data instead of storing a pointer, or use shaIssues in Metal RT Implementation |
| 51 | + |
| 52 | +### 1. **Hardcoded TLAB Layout (Critical)** |
| 53 | + |
| 54 | +```plume/plume_metal.cpp#L2926-2932 |
| 55 | +// Build the TLAB: array of GPU addresses for each root parameter. |
| 56 | +uint64_t tlab[3] = { |
| 57 | + descriptorSet->rtUAVTableBuffer ? descriptorSet->rtUAVTableBuffer->gpuAddress() : 0, |
| 58 | + descriptorSet->rtASHeaderBuffer ? descriptorSet->rtASHeaderBuffer->gpuAddress() : 0, |
| 59 | + cbvAddress |
| 60 | +}; |
| 61 | +``` |
| 62 | + |
| 63 | +The TLAB assumes a fixed 3-element layout `[UAV, AS, CBV]`. This only works for shaders with exactly this root signature. Any shader with a different layout (more textures, different order, SRV textures, etc.) will break silently. |
| 64 | + |
| 65 | +**Fix:** Either generate the TLAB based on the actual pipeline layout/descriptor set layout, or document this limitation clearly. |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +### 2. **Fragile Resource Discovery in `traceRays`** |
| 70 | + |
| 71 | +```plume/plume_metal.cpp#L2850-2872 |
| 72 | +for (size_t i = 0; i < descriptorSet->resourceEntries.size(); i++) { |
| 73 | + const auto& entry = descriptorSet->resourceEntries[i]; |
| 74 | + if (entry.resource == nullptr) continue; |
| 75 | + |
| 76 | + switch (entry.type) { |
| 77 | + case RenderDescriptorRangeType::READ_WRITE_TEXTURE: |
| 78 | + case RenderDescriptorRangeType::TEXTURE: |
| 79 | + outputTexture = static_cast<MTL::Texture*>(entry.resource); |
| 80 | + break; |
| 81 | + // ... |
| 82 | + } |
| 83 | +} |
| 84 | +``` |
| 85 | + |
| 86 | +This picks the **first** texture/AS/buffer it finds. If a shader has multiple textures, only the first one ends up in the TLAB. Related to issue #1. |
| 87 | + |
| 88 | +--- |
| 89 | + |
| 90 | +### 3. **Dangling Pointer Risk in `instanceContributions`** |
| 91 | + |
| 92 | +```plume/plume_metal.cpp#L2117-2118 |
| 93 | +resourceEntries[descriptorIndex].instanceContributions = metalAS->instanceContributions.empty() |
| 94 | + ? nullptr : metalAS->instanceContributions.data(); |
| 95 | +``` |
| 96 | + |
| 97 | +This stores a raw pointer to the internal data of `MetalAccelerationStructure::instanceContributions`. If the AS is destroyed or the vector reallocates, this becomes dangling. |
| 98 | + |
| 99 | +**Fix:** Copy the data instead of storing a pointer, or use shared ownership. |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +### 4. **Missing `useResource` for BLAS Referenced by TLAS** |
| 104 | + |
| 105 | +In `traceRays`, only the TLAS is marked: |
| 106 | +```plume/plume_metal.cpp#L2944-2946 |
| 107 | +if (tlas != nullptr) { |
| 108 | + activeComputeEncoder->useResource(tlas, MTL::ResourceUsageRead); |
| 109 | +} |
| 110 | +``` |
| 111 | + |
| 112 | +But the BLASes that the TLAS references are not marked with `useResource`. Metal requires all resources accessed by the GPU to be marked on the encoder. |
| 113 | + |
| 114 | +**Fix:** Track BLAS references in the TLAS and call `useResource` on each. |
| 115 | + |
| 116 | +--- |
| 117 | + |
| 118 | +### 5. **Memory Leak if VFT Creation Fails** |
| 119 | + |
| 120 | +```plume/plume_metal.cpp#L1867-1870 |
| 121 | +visibleFunctionTable = computePipeline->newVisibleFunctionTable(vftDesc); |
| 122 | +if (visibleFunctionTable == nullptr) { |
| 123 | + fprintf(stderr, "Failed to create visible function table.\n"); |
| 124 | +} // computePipeline is not released here, but the pipeline is left in a broken state |
| 125 | +``` |
| 126 | + |
| 127 | +If VFT creation fails, the pipeline is partially constructed but `computePipeline` is not cleaned up. The destructor will release it, but the object is in an invalid state without proper error propagation. |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +### 6. **No Intersection Shader Support** |
| 132 | + |
| 133 | +```plume/plume_metal.h#L710 |
| 134 | +MTL::IntersectionFunctionTable *intersectionFunctionTable = nullptr; |
| 135 | +``` |
| 136 | + |
| 137 | +This is declared but never populated. Custom intersection shaders (for procedural geometry) won't work. |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +### 7. **Only Descriptor Set 0 is Used** |
| 142 | + |
| 143 | +```plume/plume_metal.cpp#L2822 |
| 144 | +MetalDescriptorSet* descriptorSet = raytracingDescriptorSets[0]; |
| 145 | +``` |
| 146 | + |
| 147 | +Multiple descriptor sets aren't supported for raytracing, even though `raytracingDescriptorSets` is an array. |
| 148 | + |
| 149 | +--- |
| 150 | + |
| 151 | +### 8. **Potential Race Condition on `rtBuffersDirty`** |
| 152 | + |
| 153 | +`rtBuffersDirty` is written in `setTexture()`/`setAccelerationStructure()` (called from app thread potentially) and read/written in `traceRays()` (called from command list recording). No synchronization. |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +### Summary |
| 158 | + |
| 159 | +The implementation works for the **specific raytracing example** (single UAV output, single TLAS, single CBV in that exact order). For a production-quality implementation, the TLAB construction needs to be generalized to work with arbitrary root signatures/descriptor layouts. |
| 160 | + |
| 161 | +Would you like me to help fix any of these issues?red ownership. |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +### 4. **Missing `useResource` for BLAS Referenced by TLAS** |
| 166 | + |
| 167 | +In `traceRays`, only the TLAS is marked: |
| 168 | +```plume/plume_metal.cpp#L2944-2946 |
| 169 | +if (tlas != nullptr) { |
| 170 | + activeComputeEncoder->useResource(tlas, MTL::ResourceUsageRead); |
| 171 | +} |
| 172 | +``` |
| 173 | + |
| 174 | +But the BLASes that the TLAS references are not marked with `useResource`. Metal requires all resources accessed by the GPU to be marked on the encoder. |
| 175 | + |
| 176 | +**Fix:** Track BLAS references in the TLAS and call `useResource` on each. |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +### 5. **Memory Leak if VFT Creation Fails** |
| 181 | + |
| 182 | +```plume/plume_metal.cpp#L1867-1870 |
| 183 | +visibleFunctionTable = computePipeline->newVisibleFunctionTable(vftDesc); |
| 184 | +if (visibleFunctionTable == nullptr) { |
| 185 | + fprintf(stderr, "Failed to create visible function table.\n"); |
| 186 | +} // computePipeline is not released here, but the pipeline is left in a broken state |
| 187 | +``` |
| 188 | + |
| 189 | +If VFT creation fails, the pipeline is partially constructed but `computePipeline` is not cleaned up. The destructor will release it, but the object is in an invalid state without proper error propagation. |
| 190 | + |
| 191 | +--- |
| 192 | + |
| 193 | +### 6. **No Intersection Shader Support** |
| 194 | + |
| 195 | +```plume/plume_metal.h#L710 |
| 196 | +MTL::IntersectionFunctionTable *intersectionFunctionTable = nullptr; |
| 197 | +``` |
| 198 | + |
| 199 | +This is declared but never populated. Custom intersection shaders (for procedural geometry) won't work. |
| 200 | + |
| 201 | +--- |
| 202 | + |
| 203 | +### 7. **Only Descriptor Set 0 is Used** |
| 204 | + |
| 205 | +```plume/plume_metal.cpp#L2822 |
| 206 | +MetalDescriptorSet* descriptorSet = raytracingDescriptorSets[0]; |
| 207 | +``` |
| 208 | + |
| 209 | +Multiple descriptor sets aren't supported for raytracing, even though `raytracingDescriptorSets` is an array. |
| 210 | + |
| 211 | +--- |
| 212 | + |
| 213 | +### 8. **Potential Race Condition on `rtBuffersDirty`** |
| 214 | + |
| 215 | +`rtBuffersDirty` is written in `setTexture()`/`setAccelerationStructure()` (called from app thread potentially) and read/written in `traceRays()` (called from command list recording). No synchronization. |
| 216 | + |
| 217 | +--- |
| 218 | + |
| 219 | +### Summary |
| 220 | + |
| 221 | +The implementation works for the **specific raytracing example** (single UAV output, single TLAS, single CBV in that exact order). For a production-quality implementation, the TLAB construction needs to be generalized to work with arbitrary root signatures/descriptor layouts. |
| 222 | + |
| 223 | +Would you like me to help fix any of these issues? |
| 224 | + |
| 225 | +------------- |
| 226 | + |
| 227 | +Here’s the status versus the 8 issues you listed: |
| 228 | + |
| 229 | +Fixed |
| 230 | +- **#1 Hardcoded TLAB layout**: now built from the parsed Metal Shader Converter root signature JSON, so it follows the shader’s actual root layout. |
| 231 | +- **#2 Fragile resource discovery**: removed the “first texture/AS/CBV” scan; descriptor tables are populated by register + space across the bound descriptor sets. |
| 232 | +- **#3 Dangling instanceContributions pointer**: now stored as a value copy in `MetalDescriptorSet::ResourceEntry`, so no dangling. |
| 233 | +- **#7 Only descriptor set 0**: root signature `RegisterSpace` maps to descriptor set index, so multiple sets are supported. |
| 234 | + |
| 235 | +Partially/changed behavior |
| 236 | +- **#8 rtBuffersDirty race**: `traceRays` no longer reads this flag, so the original read/write race is gone. The flag is still set but unused; we can delete it or make it atomic if you prefer. |
| 237 | + |
| 238 | +Still open |
| 239 | +- **#4 Missing `useResource` for BLAS referenced by TLAS**: still not handled. |
| 240 | +- **#5 VFT creation failure cleanup/state**: unchanged. |
| 241 | +- **#6 Intersection shader support**: still not wired up. |
| 242 | + |
| 243 | +If you want, I can tackle the remaining three in the next pass. |
0 commit comments