Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
试了各种方法,从godbolt.org边看汇编出来的代码边调试,把数组下标从int改成size_t以后有一点的提升,加了-ffast-math和-march=native以后提升明显,把star的结构体从AOS改成了SOA,提升了很多,把一些临时变量变成了数组,空间换时间,alignas试了8,16,32,64,128,32和64差不多都比较好,128会有所下降,还有就是把一些比较复杂的循环拆成了一些小循环,有一点点提升,#prama omp simd在所有循环之前都加了,但测试好像只要在最外层循环外面加就行,里面加不加都不影响,看汇编发现三个及以上的连乘好像会比两个数乘法更复杂的样子,不知道会不会降低速度。
baseline的结果是:
Initial energy: -8.571528
Final energy: -8.511633
Time elapsed: 1546 ms
优化以后:
simd optimized
Initial energy: -9.936085
Final energy: -9.926659
Time elapsed: 198 ms
大概提升了7.8倍