[XPU] add fused_dropout_add XPU kernel and remove Python fallback#78629
Open
YqGe585 wants to merge 2 commits intoPaddlePaddle:developfrom
Open
[XPU] add fused_dropout_add XPU kernel and remove Python fallback#78629YqGe585 wants to merge 2 commits intoPaddlePaddle:developfrom
YqGe585 wants to merge 2 commits intoPaddlePaddle:developfrom
Conversation
…and remove Python fallback - Add XPU forward/backward kernels for fused_dropout_add (paddle/phi/kernels/fusion/xpu/fused_dropout_add_kernel.cc and paddle/phi/kernels/fusion/xpu/fused_dropout_add_grad_kernel.cc) - Register fused_dropout_add in XPU2 and XPU3 op lists (FLOAT32, FLOAT16) - Remove Python-level fallback in fused_dropout_add.py that was causing both GPU and XPU to use paddle.nn.functional.dropout with independent PRNG state, producing non-comparable stochastic results The XPU kernel uses xpu::dropout() with a resolved seed and adds the result to y. Note: element-wise results differ from GPU due to different PRNG algorithms (XPU library vs GPU Philox4) — expected for stochastic ops.
|
你的PR提交成功,感谢你对开源项目的贡献! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Custom Device
PR Types
New features
Description
paddle.incubate.nn.functional.fused_dropout_add在 XPU 设备上存在精度问题,根本原因有两个:问题 1:Python 层 fallback 导致 GPU/XPU 使用独立 PRNG 状态
已安装的 Paddle 中
fused_dropout_add.py存在临时 Python fallback,强制 GPU 和 XPU 都通过paddle.nn.functional.dropout执行,而两个设备各自推进独立的随机数状态,导致 dropout mask 不同,产生最大绝对误差达 0.66748(阈值为 0.05)。同时会发出警告:"Currently, fused_dropout_add maybe has precision problem, so it falls back to dropout + add."问题 2:XPU 缺少 fused_dropout_add C++ 算子
移除 Python fallback 后,XPU 设备尝试调用
_C_ops.fused_dropout_add时报错:NotFound: kernel fused_dropout_add is not registered,因为 XPU 从未实现该算子。本次修改内容:
fused_dropout_add_kernel.cc):使用xpu::dropout()+xpu::add(),解析 seed 并存入seed_offset以供 backward 复现 maskfused_dropout_add_grad_kernel.cc):从seed_offset恢复 seed,重新生成 mask 以计算梯度fused_dropout_add和fused_dropout_add_grad(FLOAT32、FLOAT16)fused_dropout_add.py中的 Python fallback,使所有设备直接调用 C++ kernel验证: 修复后 XPU kernel 可正常执行,不再报 kernel not found 错误,也不再发出 fallback 警告。
是否引起精度变化
是。移除 Python fallback 后,XPU 设备将使用原生 XPU C++ kernel(
xpu::dropout())而非paddle.nn.functional.dropoutfallback。由于 GPU 使用 Philox4 PRNG,XPU 使用 XPU 库自有 PRNG,即便给定相同 seed,两者生成的 dropout mask 也不同——这是跨设备随机算子的固有特性,不影响单设备下的正确性。