[Codegen] add FoldExtractSliceOfFillThroughBlockArg pattern to TileAndDistributeToWorkgroups#22983
Closed
bangtianliu wants to merge 1 commit intoiree-org:mainfrom
Closed
[Codegen] add FoldExtractSliceOfFillThroughBlockArg pattern to TileAndDistributeToWorkgroups#22983bangtianliu wants to merge 1 commit intoiree-org:mainfrom
bangtianliu wants to merge 1 commit intoiree-org:mainfrom
Conversation
45f6165 to
a5b2ab5
Compare
cd6ffd3 to
8dd5f67
Compare
Contributor
Author
|
I verified this PR's changes by commenting out the pattern in With the changes from this PR and #22953, O3 can now be used for the llama 8b fp16 quality tests added in #22379. |
…dDistributeToWorkgroups ci-extra: test_torch Signed-off-by: Bangtian Liu <liubangtian@gmail.com>
8dd5f67 to
ae969b8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In split reduction lowering (e.g.,
iree_linalg_ext.arg_compare),linalg.filloperations are defined outside nestedscf.forallloops. The existingFoldExtractSliceOfBroadcastPatternfolds intermediate broadcast + extract_slice ops, exposing the fill result to the inner loop's shared_outs. However, inside the inner loop,tensor.extract_sliceon the block argument still needs the fill to be materialized locally for correct bufferization.This causes bufferization issues because the fill is outside the inner loop, but each workgroup needs its own properly initialized tile.
Solution
This PR adds
FoldExtractSliceOfFillThroughBlockArgPatternwhich traces block arguments back to their defining linalg.fill and creates a new fill on the extracted slice inside the loop:How it works with FoldExtractSliceOfBroadcastPattern
For the nested loop case in split reduction, the patterns are applied together:
This ensures fill initialization happens at the innermost level where computation occurs, enabling correct bufferization without race conditions.
ci-extra: test_torch