Skip to content

Attempt to optimize run_field_matrix_solver! for CUDA #2401

@petebachant

Description

@petebachant

This shows up as the top kernel in the "flagship" config (with the exception of 0M instead of 1M microphysics):

Image

Above is from results/nsys/baseline.nsys-rep, commit a0bb502 of this repo.

I've tried a few modifications, which are summarized here. So far, none have sped up the simulation--only improved registers per thread and occupancy metrics.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions