-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathnowresult.csv
More file actions
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 1 column, instead of 2 in line 4.
2134 lines (2134 loc) · 112 KB
/
nowresult.csv
File metadata and controls
2134 lines (2134 loc) · 112 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
into Latency 7.65 0.19
bound Latency 12.74 0.0
pushad Latency 15.62 0.0
aam_throughput Latency 8.42 0.0
enter 4, 1 Latency 50.72 0.24
aam_latency Latency 29.6 0.0
enter 4, 4 Latency 68.59 0.0
enter 4, 0 Latency 20.0 0.0
daa Latency 23.37 0.0
enter 4, 2 Latency 63.99 0.0
enter 4, 3 Latency 69.15 0.0
das Latency 25.5 0.0
aad Latency 18.63 0.0
sahf Latency 1.16 0.0
aaa Latency 38.16 0.0
lahf_sahf Latency 1.17 0.0
salc Latency 5.88 0.0
leave Latency 4.82 0.0
salc_inc_al Latency 5.43 0.0
popad Latency 12.22 0.02
aas Latency 35.22 0.0
lahf Latency 0.98 0.0
Case 1: Consecutive single loops Latency 6.15 nan
Case 5: Loop with one branch having simple repetitive pattern of period count2 inside Latency 0.74 0.15
Case 8: Loop with alternating indirect branch inside Latency 1.18 0.23
Case 6: Loop with one branch having complex repetitive pattern of period count2 inside Latency 1.92 0.2
Case 4: Loop with 2 alternating branches inside Latency 3.68 2.86
Case 2: Nested loops Latency 7.33 3.3
Case 3: Loop with 1 alternating branch inside Latency 4.52 4.15
Case 7: Loop with one 0011 branch and one 00011 branch inside Latency 0.75 0.28
Case 9: Loop with 3-way indirect branch inside Latency 1.88 0.47
Case 1: Consecutive single loops Latency 4.45 nan
Case 11: One loop with multiple branches inside Latency 0.28 0.13
Case 8: Loop with alternating indirect branch inside Latency 1.37 0.23
Case 6: Loop with one branch having complex repetitive pattern of period count2 inside Latency 0.73 0.19
Case 4: Loop with 2 alternating branches inside Latency 3.67 1.71
Case 2: Nested loops Latency 7.35 3.33
Case 3: Loop with 1 alternating branch inside Latency 4.24 3.41
Case 7: Loop with one 0011 branch and one 00011 branch inside Latency 0.8 2.38
Case 10: One loop with multiple loops inside Latency -2.01 2.09
Case 9: Loop with 3-way indirect branch inside Latency 1.77 0.37
Case 5: Loop with one branch having simple repetitive pattern of period count2 inside Latency 1.9 0.21
stride1 = 4096, stride2 = 64 Latency 1.89 0.01
stride1 = 512, stride2 = 32 Latency 1.81 0.0
stride1 = 4096, stride2 = 512 Latency 1.81 0.0
stride1 = 256, stride2 = 32 Latency 1.96 0.0
stride1 = 2048, stride2 = 0 Latency 1.22 0.19
stride1 = 1024, stride2 = 8 Latency 0.99 0.0
stride1 = 2048, stride2 = 8 Latency 1.1 0.11
stride1 = 1024, stride2 = 0 Latency 0.99 0.0
stride1 = 4096, stride2 = 0 Latency 8.94 0.16
stride1 = 16384, stride2 = 0 Latency 9.7 0.22
stride1 = 128, stride2 = 64 Latency 1.81 0.0
stride1 = 4096, stride2 = 8 Latency 1.22 0.19
stride1 = 256, stride2 = 64 Latency 1.88 0.01
stride1 = 2048, stride2 = 32 Latency 1.96 0.0
stride1 = 1024, stride2 = 128 Latency 1.96 0.0
stride1 = 2048, stride2 = 1024 Latency 1.96 0.0
stride1 = 1024, stride2 = 64 Latency 1.88 0.01
stride1 = 2048, stride2 = 64 Latency 1.89 0.01
stride1 = 16384, stride2 = 8 Latency 1.13 0.17
stride1 = 4096, stride2 = 32 Latency 1.81 0.0
stride1 = 2048, stride2 = 512 Latency 1.89 0.01
stride1 = 1024, stride2 = 512 Latency 1.81 0.0
stride1 = 2048, stride2 = 256 Latency 1.88 0.01
stride1 = 2048, stride2 = 128 Latency 1.88 0.01
stride1 = 4096, stride2 = 128 Latency 1.96 0.0
stride1 = 512, stride2 = 64 Latency 1.96 0.0
stride1 = 256, stride2 = 128 Latency 1.81 0.0
stride1 = 64, stride2 = 32 Latency 1.88 0.01
stride1 = 1024, stride2 = 32 Latency 1.81 0.0
stride1 = 8192, stride2 = 0 Latency 8.96 0.19
stride1 = 1024, stride2 = 256 Latency 1.81 0.0
stride1 = 4096, stride2 = 1024 Latency 1.88 0.01
stride1 = 128, stride2 = 32 Latency 1.95 0.0
stride1 = 512, stride2 = 256 Latency 1.96 0.0
stride1 = 8192, stride2 = 8 Latency 1.21 0.18
stride1 = 512, stride2 = 128 Latency 1.88 0.01
stride1 = 4096, stride2 = 256 Latency 1.89 0.01
cvtpi2ps xmm, mmx / cvtps2pi mmx, xmm / POR mmx,mmx Latency 4.31 0.0
cvtsi2sd xmm, r64 / cvtsd2si r64, xmm Latency 12.87 0.0
movd xmm,m32 Throughput 1.32 0.0
movq xmm,r64 Throughput 1.0 0.0
cvtsi2ss xmm,r32 Throughput 5.96 0.0
movntpd m128, xmm / movaps xmm, m128 Latency 126.36 5.43
mov r8, m8 Throughput 1.19 0.0
movdqa m128, xmm Throughput 1.8 0.0
movq2dq xmm, mmx Throughput 1.07 0.0
cvtsd2si r64, xmm / cvtsi2sd xmm, r64 / MAXPD xmm,xmm Latency 7.32 0.01
pextrb m8,xmm,1 Throughput 1.72 0.0
cvtsd2ss xmm,xmm Latency 14.85 0.0
movlps m64,xmm Throughput 1.66 0.0
movups m128,xmm Throughput 1.66 0.0
movd xmm, m32 / movd m32, xmm Latency 2.15 0.0
movntdqa xmm, m / movdqa m, xmm Latency 2.15 0.0
cvtpd2ps xmm,[m128] Throughput 1.42 0.0
mov r8h, m8 Throughput 1.29 0.0
cvtss2si r64, xmm Throughput 0.99 0.0
cvtps2pi mmx, xmm Throughput 1.07 0.0
cvtdq2ps xmm,xmm Latency 14.85 0.0
movd mmx, r32 / movd r32, mmx Latency 3.75 0.0
mov m32,r32 Throughput 1.8 0.0
mov m64,r64 Throughput 1.66 0.0
pextrq r64, xmm,1 / movq xmm, r64 Latency 3.47 0.0
mov r8h,r8 Throughput 0.97 0.0
cvtsi2sd xmm,r64 Throughput 1.98 0.0
mov r8h,r8h Latency 0.99 0.0
movhps m64,xmm Throughput 1.55 0.0
movntdqa xmm,m Throughput 1.32 0.0
movsd m64, xmm / movsd xmm, m64 Latency 2.22 0.04
movntps m128,xmm Throughput 1.67 0.0
mov m16,r16 Throughput 1.66 0.0
cvtps2pi mmx, xmm / cvtpi2ps xmm, mmx / POR xmm,xmm Latency 3.97 0.0
cvtpd2dq xmm,[m128] Throughput 1.42 0.0
cvtpi2pd xmm, mmx / cvtpd2pi mmx, xmm / POR mmx,mmx Latency 4.65 0.0
cvtpi2pd xmm,[m] Throughput 1.34 0.0
pextrw r32,xmm,1 Throughput 1.08 0.0
cvtsi2ss xmm,[m32] Throughput 6.41 0.17
cvtsd2si r32, xmm Throughput 0.99 0.0
cvtss2sd xmm, xmm Latency 16.09 0.0
movq r64, xmm Throughput 1.08 0.0
mov m16, r16 / mov r16, m16 Latency 2.48 0.0
cvtsd2si r64,[m64] Throughput 1.24 0.0
mov m8, r8h / mov r8h, m8 Latency 15.02 0.0
movlpd xmm, m Throughput 1.42 0.0
cvtpi2ps xmm,[m] Throughput 1.34 0.0
movd xmm, r32 Throughput 1.08 0.0
pinsrq xmm,r64,1 Throughput 1.0 0.0
movdqa m128, xmm / movdqa xmm, m128 Latency 2.04 0.03
movq mmx, r64 / movq r64, mmx Latency 3.75 0.0
movsd xmm,xmm Latency 1.16 0.0
cvtdq2ps xmm, xmm / cvtps2dq xmm, xmm Latency 2.68 0.0
cvtpi2pd xmm, mmx / cvtpd2pi mmx, xmm Latency 4.68 0.0
movapd m128, xmm / movapd xmm, m128 Latency 2.15 0.0
movss xmm, xmm Latency 1.22 0.03
pextrw r32,mmx,1 Throughput 0.99 0.0
cvtpd2dq xmm, xmm Latency 5.96 0.0
movlhps xmm,xmm Latency 1.16 0.0
movapd xmm, m128 Throughput 1.43 0.0
cvtps2pi mmx,[m] Throughput 1.24 0.0
pinsrw xmm,r32,1 Throughput 1.08 0.0
pinsrw xmm, r32,1 / movd r32, xmm Latency 4.29 0.0
cvtsd2si r64, xmm / cvtsi2sd xmm, r64 / POR xmm,xmm Latency 8.25 0.0
cvtsd2ss xmm, xmm / cvtss2sd xmm, xmm / MAXPD xmm,xmm Latency 1.99 0.0
mov m8, r8 / mov r8, m8 Latency 2.68 0.0
movdqu xmm, m128 Throughput 1.32 0.0
cvtpi2ps xmm, mmx / cvtps2pi mmx, xmm Latency 4.83 0.0
movq xmm, m64 Throughput 1.43 0.0
pinsrb xmm,r32,1 Throughput 1.0 0.0
cvtsi2sd xmm, r64 / cvtsd2si r64, xmm / OR r64,r64 Latency 8.28 0.0
cvtsi2sd xmm,r32 Throughput 6.46 0.0
movdqa xmm, xmm / movdqa xmm, xmm / POR xmm,xmm Latency 1.0 0.0
movd m32, xmm Throughput 1.66 0.0
movaps xmm, xmm / movaps xmm, xmm / POR xmm,xmm Latency 2.53 0.0
mov r8h, m8 / mov m8, r8h Latency 15.02 0.0
pmovmskb r32,xmm Throughput 0.99 0.0
cvtsi2ss xmm, r64 / cvtss2si r64, xmm Latency 12.93 0.03
cvtss2si r32,[m32] Throughput 1.24 0.0
mov r8h,m8 Throughput 1.28 0.0
movaps xmm, m128 Throughput 1.32 0.0
pinsrw xmm,m16,1 Throughput 0.99 0.0
pinsrd xmm, r32,1 / movd r32, xmm Latency 4.29 0.0
movntpd m128,xmm Throughput 1.55 0.0
pextrw m16,xmm,1 Throughput 1.72 0.0
movmskps r32, xmm / movd xmm, r32 Latency 3.75 0.0
cvtsd2si r64, xmm Throughput 1.08 0.0
movq m64, xmm / movq xmm, m64 Latency 1.98 0.0
movq mmx, m64 Throughput 1.43 0.0
pextrd r32,xmm,1 Throughput 1.0 0.0
movdqa m, xmm Throughput 1.66 0.0
cvtsi2sd xmm, r32 / cvtsd2si r32, xmm / OR r32,r32 Latency 3.36 0.02
movq m64,mmx Throughput 1.66 0.0
pextrq m64,xmm,1 Throughput 1.72 0.0
cvtpd2pi mmx, xmm Throughput 1.57 0.0
movlpd m, xmm / movlpd xmm, m Latency 4.29 0.0
movmskpd r32,xmm Throughput 1.17 0.0
mov r8h, r8h Latency 1.07 0.0
movdqa xmm, xmm / movdqa xmm, xmm / MAXPS xmm,xmm Latency 2.69 0.02
pextrd r32, xmm,1 / movd xmm, r32 Latency 3.76 0.0
pextrb r32,xmm,1 Throughput 1.0 0.0
movaps m128,xmm Throughput 1.8 0.0
movlhps xmm, xmm Latency 1.15 0.0
movdq2q mmx, xmm / movq2dq xmm, mmx Latency 1.08 0.0
mov m8,r8h Throughput 1.66 0.0
pinsrd xmm,m32,1 Throughput 1.1 0.08
cvtsd2si r32, xmm / cvtsi2sd xmm, r32 / POR xmm,xmm Latency 4.68 0.01
movd mmx,r32 Throughput 1.0 0.0
cvtsd2si r32,[m64] Throughput 1.24 0.0
movq m64,xmm Throughput 1.8 0.0
cvtdq2pd xmm, xmm / cvtpd2dq xmm, xmm / MAXPD xmm,xmm Latency 4.52 0.0
movaps m128, xmm / movups xmm, m128 Latency 1.98 0.0
movd r32, xmm Throughput 1.08 0.0
pinsrb xmm,m8,1 Throughput 1.08 0.0
movlps m64, xmm / movlps xmm, m64 Latency 3.96 0.0
movq mmx,r64 Throughput 1.08 0.0
movsd m64,xmm Throughput 1.66 0.0
cvtps2pi mmx, xmm / cvtpi2ps xmm, mmx / MAXPS xmm,xmm Latency 3.69 0.02
movss xmm, xmm / movss xmm, xmm Latency 1.16 0.0
pinsrw mmx, r32,1 / movd r32, mmx Latency 4.29 0.0
movhps xmm, m64 Throughput 1.42 0.0
cvtpd2ps xmm,xmm Latency 6.06 0.34
movss m32,xmm Throughput 1.66 0.0
movdqu m128,xmm Throughput 1.66 0.0
movsd xmm, xmm / movsd xmm, xmm Latency 1.16 0.0
movq r64, mmx Throughput 0.99 0.0
cvtdq2pd xmm, xmm / cvtpd2dq xmm, xmm / POR xmm,xmm Latency 4.69 0.02
movlps xmm, m64 Throughput 1.31 0.0
movhps m64, xmm / movhps xmm, m64 Latency 5.49 0.03
cvtsi2ss xmm,r64 Throughput 1.98 0.0
mov r64, m64 Throughput 1.56 0.27
movaps xmm, xmm / movaps xmm, xmm / MAXPS xmm,xmm Latency 2.13 0.0
movdqa m128,xmm Throughput 1.8 0.0
movdqa xmm, m128 Throughput 1.43 0.0
movq xmm, m64 / movq m64, xmm Latency 2.15 0.0
movss xmm,xmm Latency 1.16 0.0
lddqu xmm,m128 Throughput 1.32 0.0
cvtpi2pd xmm,mmx Throughput 0.99 0.0
movd xmm,r32 Throughput 1.05 0.02
mov r16, m16 Throughput 1.29 0.0
movapd m128,xmm Throughput 1.66 0.0
cvtps2dq xmm,[m] Throughput 1.24 0.0
movss xmm, m32 Throughput 1.43 0.0
cvtsd2si r32, xmm / cvtsi2sd xmm, r32 / MAXPD xmm,xmm Latency 3.64 0.0
mov r8h, r8 / mov r8, r8h Latency 0.99 0.0
mov m8,r8 Throughput 1.66 0.0
cvtsd2ss xmm, xmm / cvtss2sd xmm, xmm / POR xmm,xmm Latency 2.98 0.0
cvtsi2sd xmm, r32 / cvtsd2si r32, xmm Latency 4.83 0.0
pextrd m32,xmm,1 Throughput 1.72 0.0
cvtps2pd xmm,[m64] Throughput 1.43 0.0
pinsrd xmm,r32,1 Throughput 1.04 0.0
lddqu xmm, m128 / movdqa m128, xmm Latency 2.15 0.0
mov m8, r8h Throughput 1.8 0.0
cvtsd2ss xmm, xmm / cvtss2sd xmm, xmm Latency 1.98 0.0
movd r32, mmx Throughput 1.07 0.0
pinsrq xmm,m64,1 Throughput 1.0 0.0
cvtdq2pd xmm,xmm Latency 5.96 0.0
cvtss2sd xmm,m32 Throughput 6.33 0.03
movntq m64, mmx / movq mmx, m64 Latency 127.23 5.09
movaps m128, xmm / movaps xmm, m128 Latency 1.98 0.0
movq m64, mmx / movq mmx, m64 Latency 2.15 0.0
movmskps r32,xmm Throughput 0.99 0.0
mov r8h, r8h / mov r8h, r8h Latency 0.99 0.0
cvtps2pd xmm, xmm Latency 5.5 0.0
movlhps xmm, xmm / movlhps xmm, xmm Latency 1.16 0.0
cvtdq2ps xmm,[m] Throughput 1.24 0.0
cvtdq2pd xmm,[m64] Throughput 1.45 0.0
cvtsd2ss xmm,m64 Throughput 6.34 0.04
movq xmm, r64 Throughput 1.0 0.0
pinsrb xmm, r32,1 / movd r32, xmm Latency 4.35 0.03
cvtsi2sd xmm,[m64] Throughput 1.98 0.0
cvtss2si r32, xmm / cvtsi2ss xmm, r32 / MAXPS xmm,xmm Latency 3.69 0.02
pextrb r32, xmm,1 / movd xmm, r32 Latency 3.47 0.0
pmovmskb r32, xmm / movd xmm, r32 Latency 3.81 0.03
movdqu m128, xmm / movdqu xmm, m128 Latency 2.33 0.27
movq m64, xmm Throughput 1.66 0.0
cvtps2dq xmm, xmm Latency 18.3 0.04
movnti m32,r32 Throughput 1.54 0.0
cvtpd2pi mmx,[m] Throughput 1.42 0.0
movmskpd r32, xmm / movd xmm, r32 Latency 3.47 0.0
pextrw r32, xmm,1 / movd xmm, r32 Latency 3.47 0.0
movd mmx, r32 Throughput 1.08 0.0
movnti m32, r32 / mov r32, m32 Latency 127.19 4.57
movups xmm, m128 Throughput 1.32 0.0
cvtdq2pd xmm, xmm / cvtpd2dq xmm, xmm Latency 4.75 0.04
cvtss2si r32, xmm Throughput 0.99 0.0
pinsrq xmm, r64,1 / movq r64, xmm Latency 4.29 0.0
pinsrw mmx,r32,1 Throughput 1.08 0.0
cvtpd2ps xmm, xmm / cvtps2pd xmm, xmm Latency 4.68 0.0
cvtsi2ss xmm, r32 / cvtss2si r32, xmm / OR r32,r32 Latency 3.36 0.02
movups m128, xmm / movaps xmm, m128 Latency 2.15 0.0
movsd xmm, xmm Latency 1.16 0.0
cvtdq2ps xmm, xmm / cvtps2dq xmm, xmm / POR xmm,xmm Latency 3.02 0.01
cvtss2si r32, xmm / cvtsi2ss xmm, r32 / POR xmm,xmm Latency 4.68 0.02
movd xmm, r32 / movd r32, xmm Latency 3.47 0.0
cvtpi2ps xmm,mmx Throughput 1.4 0.0
movss m32, xmm / movss xmm, m32 Latency 2.15 0.0
mov r8, r8h Throughput 0.97 0.0
movq xmm,m64 Throughput 1.32 0.0
cvtsi2sd xmm,[m32] Throughput 6.27 0.0
movdq2q mmx,xmm Throughput 0.99 0.0
movntdq m128, xmm / movdqa xmm, m128 Latency 124.95 0.48
movlpd m,xmm Throughput 1.66 0.0
movntq m64,mmx Throughput 1.82 0.0
mov m64, r64 / mov r64, m64 Latency 1.98 0.0
pextrq r64,xmm,1 Throughput 1.0 0.0
mov r32, m32 Throughput 1.43 0.0
movntdq m128,xmm Throughput 1.67 0.0
movq xmm, r64 / movq r64, xmm Latency 3.47 0.0
cvtdq2ps xmm, xmm / cvtps2dq xmm, xmm / MAXPS xmm,xmm Latency 11.84 0.0
mov m32, r32 / mov r32, m32 Latency 1.98 0.0
cvtpd2pi mmx, xmm / cvtpi2pd xmm, mmx / POR xmm,xmm Latency 4.64 0.0
cvtpd2pi mmx, xmm / cvtpi2pd xmm, mmx / MAXPD xmm,xmm Latency 3.69 0.02
cvtsi2ss xmm, r32 / cvtss2si r32, xmm Latency 4.83 0.0
movntps m128, xmm / movaps xmm, m128 Latency 124.78 0.39
movsd xmm, m64 Throughput 1.43 0.0
pextrw r32, mmx,1 / movd mmx, r32 Latency 3.76 0.0
Case 1: SSE2 Latency 0.74 0.0
div , registersize 32, 1 / 1 Throughput 4.97 0.0
idiv , registersize 32 Throughput 7.22 0.01
idiv , registersize 16 Throughput 7.61 0.0
idiv , registersize 32, 1 / 1 Throughput 4.64 0.0
idiv , registersize 64, 1 / 1 Throughput 61.34 4.37
Throughput with memory operand: idiv , registersize 8, 1 / 1 Throughput 6.99 0.01
idiv , registersize 8, 1 / 1 Throughput 6.99 0.01
Throughput with memory operand: div , registersize 32, 1 / 1 Throughput 4.97 0.0
div , registersize 8 Throughput 4.97 0.0
Throughput with memory operand: div , registersize 8, 1 / 1 Throughput 4.97 0.0
div , registersize 16 Throughput 5.96 0.0
div , registersize 16, 1 / 1 Throughput 5.96 0.0
idiv , registersize 8 Throughput 6.95 0.0
Throughput with memory operand: idiv , registersize 32, 1 / 1 Throughput 5.06 0.01
div , registersize 32 Throughput 6.99 0.01
idiv , registersize 64 Throughput 66.91 0.51
idiv , registersize 16, 1 / 1 Throughput 7.62 0.0
div , registersize 64, 1 / 1 Throughput 50.36 3.88
Throughput with memory operand: idiv , registersize 64, 1 / 1 Throughput 61.55 3.96
div , registersize 8, 1 / 1 Throughput 4.97 0.0
Throughput with memory operand: div , registersize 64, 1 / 1 Throughput 52.04 0.03
Throughput with memory operand: div , registersize 16, 1 / 1 Throughput 5.96 0.0
Throughput with memory operand: idiv , registersize 16, 1 / 1 Throughput 7.66 0.02
div , registersize 64 Throughput 54.97 1.9
cvtpd2ps xmm,xmm + maxps xmm,xmm Latency 10.69 0.0
dppd xmm,xmm + por xmm,xmm Latency 7.15 0.03
pshufd xmm,xmm Latency 0.99 0.0
cvtps2dq xmm,xmm + maxps xmm,xmm Latency 9.65 0.0
insertps xmm,xmm + por xmm,xmm Latency 3.09 0.0
sqrtps xmm,xmm + por xmm,xmm Latency 12.87 0.0
movsldup xmm,xmm + por xmm,xmm Latency 3.36 0.0
movss xmm,xmm + por xmm,xmm Latency 3.1 0.0
shufps xmm,xmm + addps xmm,xmm Latency 2.98 0.0
maxpd xmm,xmm Latency 2.15 0.0
cvtps2dq xmm,xmm + por xmm,xmm Latency 3.75 0.0
pshufd xmm,xmm + por xmm,xmm Latency 1.08 0.0
pshufd xmm,xmm + addps xmm,xmm Latency 3.76 0.0
cvtps2pd xmm,xmm + maxps xmm,xmm Latency 9.86 0.0
andps xmm,xmm + por xmm,xmm Latency 3.34 0.0
dppd + maxpd + por xmm,xmm Latency 5.96 0.0
movhlps xmm,xmm Latency 1.16 0.0
paddw xmm,xmm + por xmm,xmm Latency 1.07 0.0
dpps xmm,xmm + por xmm,xmm Latency 11.46 0.03
blendps xmm,xmm + por xmm,xmm Latency 3.76 0.0
mulps xmm,xmm Latency 3.22 0.0
orps xmm,xmm + por xmm,xmm Latency 3.08 0.0
maxps xmm,xmm + paddw xmm,xmm Latency 3.76 0.0
shufps xmm,xmm + por xmm,xmm Latency 3.36 0.0
cvtps2dq + por + maxps xmm,xmm Latency 2.98 0.0
roundps xmm,xmm + maxps xmm,xmm Latency 3.22 0.0
movddup xmm,xmm + maxpd xmm,xmm Latency 2.75 0.0
dpps + por + maxps xmm,xmm Latency 7.95 0.0
movsldup xmm,xmm + maxps xmm,xmm Latency 2.75 0.0
pblendw xmm,xmm + maxpd xmm,xmm Latency 3.75 0.0
movsd xmm,xmm Latency 1.16 0.0
pblendw xmm,xmm Latency 0.99 0.0
movdqa xmm,xmm Latency 0.99 0.0
dpps + maxps + por xmm,xmm Latency 8.97 0.0
pblendw xmm,xmm + por xmm,xmm Latency 0.99 0.0
cvtdq2ps xmm,xmm + maxps xmm,xmm Latency 8.91 0.0
shufps xmm,xmm + maxps xmm,xmm Latency 2.98 0.0
maxps xmm,xmm Latency 1.98 0.0
maxps xmm,xmm + por xmm,xmm Latency 3.47 0.0
movdqa xmm,xmm + maxps xmm,xmm Latency 3.75 0.0
addps xmm,xmm Latency 1.98 0.0
pshufd xmm,xmm + maxpd xmm,xmm Latency 3.76 0.0
haddps + por + maxps xmm,xmm Latency 3.97 0.0
movsd xmm,xmm + maxpd xmm,xmm Latency 2.75 0.0
sqrtps xmm,xmm + maxps xmm,xmm Latency 13.55 0.19
haddps xmm,xmm + por xmm,xmm Latency 4.83 0.0
addpd xmm,xmm + por xmm,xmm Latency 3.47 0.0
mulps xmm,xmm + maxps xmm,xmm Latency 2.97 0.0
addps xmm,xmm + por xmm,xmm Latency 3.47 0.0
por xmm,xmm Latency 0.99 0.0
haddps xmm,xmm Latency 5.42 0.34
cvtdq2ps + por + maxps xmm,xmm Latency 7.33 0.02
cvtps2pd xmm,xmm Latency 5.5 0.0
haddps xmm,xmm + maxps xmm,xmm Latency 3.47 0.0
addps xmm,xmm + maxps xmm,xmm Latency 2.15 0.0
insertps xmm,xmm Latency 1.16 0.0
sqrtps xmm,xmm Latency 21.85 0.04
addpd xmm,xmm Latency 2.15 0.0
roundps xmm,xmm + por xmm,xmm Latency 4.46 0.0
shufpd xmm,xmm + maxpd xmm,xmm Latency 2.75 0.0
dppd xmm,xmm + maxpd xmm,xmm Latency 6.76 0.0
cvtps2pd xmm,xmm + por xmm,xmm Latency 5.36 0.0
maxps xmm,xmm + orps xmm,xmm Latency 2.98 0.0
movddup xmm,xmm Latency 1.26 0.0
movaps xmm,xmm + por xmm,xmm Latency 3.08 0.0
cvtpd2ps xmm,xmm Latency 5.5 0.0
orps xmm,xmm + maxps xmm,xmm Latency 2.98 0.0
dpps xmm,xmm Latency 15.68 0.2
movdqa xmm,xmm + por xmm,xmm Latency 1.08 0.0
blendps xmm,xmm Latency 1.08 0.0
unpcklps xmm,xmm + por xmm,xmm Latency 3.34 0.0
roundps xmm,xmm Latency 3.96 0.0
movss xmm,xmm Latency 1.16 0.0
movss xmm,xmm + maxps xmm,xmm Latency 2.98 0.0
cvtdq2ps xmm,xmm + por xmm,xmm Latency 3.75 0.0
cvtdq2ps xmm,xmm Latency 16.09 0.0
shufpd xmm,xmm Latency 1.25 0.0
haddps + maxps + por xmm,xmm Latency 3.64 0.0
movsldup xmm,xmm Latency 1.16 0.0
addps xmm,xmm + maxpd xmm,xmm Latency 16.09 0.0
shufps xmm,xmm Latency 1.16 0.0
shufpd xmm,xmm + por xmm,xmm Latency 3.09 0.0
cvtps2dq + maxps + por xmm,xmm Latency 7.62 0.0
movddup xmm,xmm + por xmm,xmm Latency 3.36 0.0
andps xmm,xmm Latency 1.16 0.0
movaps xmm,xmm + maxps xmm,xmm Latency 2.75 0.0
paddw xmm,xmm + orps xmm,xmm Latency 3.09 0.0
movsd xmm,xmm + por xmm,xmm Latency 3.16 0.03
movhlps xmm,xmm + maxps xmm,xmm Latency 2.75 0.0
cvtdq2ps + maxps + por xmm,xmm Latency 2.98 0.0
pshufd xmm,xmm + paddd xmm,xmm Latency 1.08 0.0
blendps xmm,xmm + maxps xmm,xmm Latency 1.98 0.0
orps xmm,xmm Latency 1.25 0.0
maxpd xmm,xmm + por xmm,xmm Latency 3.75 0.0
movaps xmm,xmm Latency 1.15 0.0
addpd xmm,xmm + orpd xmm,xmm Latency 2.75 0.0
insertps xmm,xmm + maxps xmm,xmm Latency 2.98 0.0
cvtps2dq xmm,xmm Latency 16.83 0.0
addps xmm,xmm + addpd xmm,xmm Latency 16.09 0.0
maxpd xmm,xmm + orpd xmm,xmm Latency 2.75 0.0
addps xmm,xmm + addps xmm,xmm Latency 1.98 0.0
unpcklps xmm,xmm Latency 1.16 0.0
mulps xmm,xmm + por xmm,xmm Latency 3.04 0.04
shufps xmm,xmm + paddd xmm,xmm Latency 3.1 0.0
movhlps xmm,xmm + por xmm,xmm Latency 3.34 0.0
cvtpd2ps xmm,xmm + por xmm,xmm Latency 4.46 0.0
unpcklps xmm,xmm + maxps xmm,xmm Latency 3.05 0.04
dppd xmm,xmm Latency 9.31 0.0
dpps xmm,xmm + maxps xmm,xmm Latency 11.14 0.04
maxps xmm,xmm + maxps xmm,xmm Latency 1.98 0.0
dppd + por + maxpd xmm,xmm Latency 5.89 0.6
paddw xmm,xmm Latency 0.99 0.0
andps xmm,xmm + maxps xmm,xmm Latency 2.75 0.0
shufpd r128,r128,i Throughput 0.54 0.0
mulpd r128,r128 Throughput 2.12 0.0
orpd r128,[m] Throughput 1.3 0.0
movshdup r128,r128 Throughput 0.57 0.0
dpps r128,[m128],i Throughput 9.55 0.0
cvttps2dq r128,[m] Throughput 1.23 0.0
mulps r128,r128 Throughput 1.06 0.0
comiss r128,[m] Throughput 1.23 0.0
andps r128,[m] Throughput 1.41 0.0
minps r128,[m] Throughput 1.23 0.0
unpcklpd r128,r128 Throughput 0.57 0.0
minsd r128,r128 Throughput 0.98 0.0
comiss r128,r128 Throughput 0.98 0.0
divss r128,r128 (best case) Latency 13.11 0.3
maxss r128,[m] Throughput 1.23 0.0
Throughput with memory operand: sqrtpd r128,[m128] (worst case) Throughput 119.92 0.03
cmpeqss r128,r128 Throughput 0.98 0.0
divsd r128,r128 (best case) Latency 15.25 0.0
cmpeqps r128,r128 Throughput 0.98 0.0
Throughput with memory operand: sqrtsd r128,[m128] (best case) Throughput 60.92 0.01
divpd r128,r128 (best case) Latency 24.64 0.01
Throughput with memory operand: sqrtpd r128,[m128] (best case) Throughput 119.94 0.03
dppd r128,r128,i Throughput 6.37 0.0
rcpss r128,r128 Latency 5.29 0.0
sqrtsd r128,r128 (worst case) Latency 61.16 2.86
cmpeqps r128,[m] Throughput 1.23 0.0
andps r128,r128 Throughput 0.54 0.0
Throughput with memory operand: sqrtsd r128,[m128] (worst case) Throughput 61.07 0.09
mulss r128,r128 Throughput 0.98 0.0
roundpd r128,[m128],i Throughput 1.96 0.0
roundpd r128,r128,i Throughput 2.13 0.0
mulsd r128,[m] Throughput 1.96 0.0
orpd r128,r128 Throughput 0.52 0.0
Throughput with memory operand: sqrtss r128,[m128] (worst case) Throughput 32.93 1.09
movapd r128,[m] Throughput 1.41 0.0
movshdup r128,[m] Throughput 1.31 0.0
ucomisd r128,[m] Throughput 1.23 0.0
roundsd r128,[m128],i Throughput 1.97 0.0
sqrtpd r128,r128 (worst case) Latency 119.53 4.96
Throughput with memory operand: rcpss r128,[m128] Throughput 4.9 0.0
unpckhps r128,[m] Throughput 1.3 0.0
cmpltpd r128,r128 Throughput 0.98 0.0
movapd r128,r128 Throughput 0.57 0.0
cvtps2pd r128,[m] Throughput 1.31 0.0
blendps r128,[m128],i Throughput 0.99 0.0
rsqrtss r128,r128 Latency 5.3 0.0
haddps r128,r128 Throughput 2.94 0.0
roundps r128,r128,i Throughput 1.96 0.0
ucomisd r128,r128 Throughput 0.98 0.0
Throughput with memory operand: rsqrtss r128,[m128] Throughput 5.31 0.0
cvtdq2ps r128,r128 Throughput 1.06 0.0
shufps r128,r128,i Throughput 0.54 0.0
rsqrtps r128,r128 Latency 13.7 0.04
addss r128,[m] Throughput 1.33 0.0
Throughput with memory operand: sqrtps r128,[m128] (best case) Throughput 62.88 0.01
movups r128,r128 Throughput 0.54 0.0
maxpd r128,[m] Throughput 1.23 0.0
shufps r128,[m128],i Throughput 1.41 0.0
shufpd r128,[m128],i Throughput 1.41 0.0
hsubpd r128,[m] Throughput 4.9 0.0
movupd r128,[m] Throughput 1.3 0.0
movaps r128,r128 Throughput 0.54 0.0
haddps r128,[m] Throughput 4.9 0.0
roundps r128,[m128],i Throughput 2.13 0.0
roundss r128,[m128],i Throughput 1.96 0.0
sqrtss r128,r128 (best case) Latency 32.97 1.59
subsd r128,r128 Throughput 0.98 0.0
movupd r128,r128 Throughput 0.57 0.0
movddup r128,[m] Throughput 1.3 0.0
dpps r128,r128,i Throughput 8.81 0.0
movddup r128,r128 Throughput 0.53 0.0
hsubpd r128,r128 Throughput 2.94 0.0
unpcklpd r128,[m] Throughput 1.3 0.0
blendpd r128,r128,i Throughput 0.98 0.0
mulps r128,[m] Throughput 1.23 0.0
cvtpd2ps r128,[m] Throughput 1.41 0.0
cvtdq2ps r128,[m] Throughput 1.23 0.0
addps r128,[m] Throughput 1.23 0.0
cvtpd2ps r128,r128 Throughput 1.43 0.0
divpd r128,r128 (worst case) Latency 19.56 116.23
movaps r128,[m] Throughput 1.41 0.0
maxss r128,r128 Throughput 0.98 0.0
cmpeqss r128,[m] Throughput 1.23 0.0
addss r128,r128 Throughput 0.98 0.0
Throughput with memory operand: sqrtps r128,[m128] (worst case) Throughput 63.07 0.31
rcpps r128,r128 Latency 13.7 0.04
blendpd r128,[m128],i Throughput 0.99 0.0
movups r128,[m] Throughput 1.3 0.0
minsd r128,[m] Throughput 1.23 0.0
Throughput with memory operand: sqrtss r128,[m128] (best case) Throughput 32.42 0.01
roundsd r128,r128,i Throughput 1.96 0.0
mulsd r128,r128 Throughput 1.96 0.0
cmpltpd r128,[m] Throughput 1.24 0.0
subpd r128,r128 Throughput 0.98 0.0
sqrtps r128,r128 (best case) Latency 63.15 3.16
movsldup r128,r128 Throughput 0.53 0.0
sqrtsd r128,r128 (best case) Latency 61.15 2.46
sqrtpd r128,r128 (best case) Latency 118.95 0.66
minps r128,r128 Throughput 0.98 0.0
sqrtps r128,r128 (worst case) Latency 62.37 0.03
Throughput with memory operand: rcpps r128,[m128] Throughput 22.29 0.72
cvtpd2dq r128,[m] Throughput 1.53 0.0
divps r128,r128 (worst case) Latency 14.33 28.79
Throughput with memory operand: rsqrtps r128,[m128] Throughput 22.08 0.55
blendps r128,r128,i Throughput 0.99 0.0
cvtps2pd r128,r128 Throughput 1.06 0.0
mulss r128,[m] Throughput 1.23 0.0
cvtdq2pd r128,[m] Throughput 1.33 0.0
subpd r128,[m] Throughput 1.23 0.0
divps r128,r128 (best case) Latency 23.38 0.01
addsubps r128,r128 Throughput 0.98 0.0
sqrtss r128,r128 (worst case) Latency 32.34 0.34
movsldup r128,[m] Throughput 1.75 1.03
subsd r128,[m] Throughput 1.23 0.0
cvttps2dq r128,r128 Throughput 1.06 0.0
cvtdq2pd r128,r128 Throughput 0.99 0.0
divsd r128,r128 (worst case) Latency 10.69 28.59
addps r128,r128 Throughput 0.98 0.0
unpckhps r128,r128 Throughput 0.54 0.0
addsubps r128,[m] Throughput 1.24 0.0
cvtpd2dq r128,r128 Throughput 1.43 0.0
dppd r128,[m128],i Throughput 5.88 0.0
divss r128,r128 (worst case) Latency 7.9 11.3
maxpd r128,r128 Throughput 0.98 0.0
mulpd r128,[m] Throughput 1.96 0.0
roundss r128,r128,i Throughput 1.96 0.0
Throughput with memory source operand: cmp r64,[m64] Throughput 1.07 0.0
Latency with memory destination operand: cmp [m64],r64 Latency 0.98 0.0
Throughput with memory destination operand: xchg [m32],r32 Throughput 17.85 1.81
Latency with memory destination operand: add [m64],r64 Latency 4.9 0.0
inc r64 Throughput 0.59 0.0
xchg r32,r32 Throughput 1.5 0.0
Throughput with memory destination operand: xor [m16],r16 Throughput 1.91 0.0
Latency with memory destination operand: test [m64],r64 Latency 0.98 0.0
Throughput with memory destination operand: add [m32],r32 Throughput 1.92 0.0
Latency with memory operand: sub [m32], i Latency 5.31 0.0
mov r8, i Throughput 0.64 0.0
dec r32 Throughput 0.64 0.0
Throughput with memory destination operand: or [m16],r16 Throughput 1.92 0.0
cdq Latency 0.5 0.0
xor r8,r8 Throughput 0.54 0.0
Latency with memory operand: not [m32] Latency 4.9 0.0
adc r32, i Throughput 0.98 0.0
or r8,r8 Throughput 0.5 0.0
Latency with memory destination operand: and [m64],r64 Latency 4.9 0.0
neg r8high Throughput 0.99 0.0
mov r8,r8 Throughput 0.5 0.0
Latency with memory destination operand: and [m8],r8 Latency 5.31 0.0
Latency with memory destination operand: add [m16],r16 Latency 4.9 0.0
pause Latency 5.88 0.0
xor eax,eax / cdqe Throughput 0.5 0.0
Throughput with memory operand: cmp [m32], i Throughput 0.99 0.0
cld Latency 3.18 0.0
adc r32,r32 Throughput 1.06 0.0
Latency with memory operand: cmp [m32], i Latency 1.15 0.17
Throughput with RIP address mode: mov r32, [m32] Throughput 1.41 0.0
Throughput with memory source operand: sub r64,[m64] Throughput 1.07 0.0
cmp r8high, i Throughput 0.5 0.0
clc Latency 3.03 0.0
or r8, i Throughput 0.64 0.0
popcnt r32,r32 Throughput 0.98 0.0
dec r8high Throughput 0.64 0.0
Throughput with memory source operand: xchg r16,[m16] Throughput 16.77 0.15
inc r8high Throughput 0.59 0.0
Throughput with memory destination operand: sub [m32],r32 Throughput 1.77 0.0
Latency with memory destination operand: add [m8],r8 Latency 5.31 0.0
Throughput with memory destination operand: and [m64],r64 Throughput 1.77 0.0
Throughput with memory source operand: sub r32,[m32] Throughput 1.24 0.17
xor eax,eax / cwd Throughput 0.5 0.0
dec r8 Throughput 0.59 0.0
mov r16,r16 Throughput 0.5 0.0
add r8,r8 Throughput 0.5 0.0
Throughput with memory source operand: add r16,[m16] Throughput 1.07 0.0
not r16 Throughput 0.64 0.0
sub r8high, i Throughput 0.64 0.0
Throughput with memory source operand: or r64,[m64] Throughput 0.98 0.0
add r16, i Throughput 0.59 0.0
Latency with memory destination operand: xchg [m8],r8 Latency 18.14 0.11
Throughput with memory source operand: imul Throughput 1.22 0.0
sfence Latency 15.86 0.0
Throughput with memory destination operand: test [m32],r32 Throughput 0.98 0.0
neg r32 Throughput 0.64 0.0
Throughput with memory source operand: adc r16,[m16] Throughput 0.98 0.0
xor eax,eax / cdq Throughput 0.54 0.0
adc r8high, i Throughput 1.06 0.0
nop Latency 0.64 0.0
Latency with memory destination operand: add [m32],r32 Latency 4.9 0.0
xor r8high, i Throughput 0.59 0.0
Latency with memory destination operand: adc [m8],r8 Latency 5.44 1.06
Latency with memory destination operand: sbb [m16],r16 Latency 5.62 0.89
Throughput with memory source operand: sbb r64,[m64] Throughput 1.07 0.0
Latency with memory operand: adc [m32], i Latency 5.31 0.0
Throughput with memory source operand: add r8,[m8] Throughput 1.09 0.0
add r16,r16 Throughput 0.5 0.0
adc r8,r8 Throughput 1.06 0.0
adc r16,r16 Throughput 1.06 0.0
cmp r32,r32 Throughput 0.54 0.0
cmc Latency 3.03 0.0
Throughput with memory source operand: adc r32,[m32] Throughput 0.98 0.0
Throughput with memory operand: mov [m32], i Throughput 1.98 0.15
Throughput with ABS64 address mode: mov r32, [m32] Throughput 1.32 0.0
Throughput with memory operand: test [m32], i Throughput 1.07 0.0
Latency with memory destination operand: adc [m16],r16 Latency 4.9 0.0
adc r16, i Throughput 1.06 0.0
Latency with memory operand: add [m32], i Latency 4.9 0.0
Throughput with memory source operand: or r8,[m8] Throughput 1.07 0.0
Throughput with memory destination operand: or [m32],r32 Throughput 1.92 0.0
imul r16,r16 Throughput 1.12 0.0
xor eax,eax / cwde Throughput 0.5 0.0
adc r8, i Throughput 1.07 0.0
not r8high Throughput 0.64 0.0
Throughput with memory destination operand: xor [m32],r32 Throughput 1.77 0.0
neg r8 Throughput 1.07 0.0
Throughput with memory source operand: xchg r64,[m64] Throughput 16.65 0.0
Throughput with memory source operand: add r32,[m32] Throughput 1.09 0.0
bswap r32 Throughput 1.07 0.0
and r16,r16 Throughput 0.54 0.0
Latency with memory destination operand: xchg [m32],r32 Latency 18.03 0.0
sbb r32,r32 Throughput 1.06 0.0
Throughput with memory source operand: xchg r32,[m32] Throughput 18.68 3.77
Latency with memory destination operand: or [m8],r8 Latency 5.31 0.0
add r32,r32 Throughput 0.5 0.0
Throughput with memory destination operand: xchg [m16],r16 Throughput 18.03 0.0
Throughput with memory operand: add [m32], i Throughput 1.96 0.0
Throughput with memory operand: neg [m32] Throughput 2.13 0.0
cmp r64, i Throughput 0.51 0.0
dec r16 Throughput 0.64 0.0
Latency with memory operand: test [m32], i Latency 1.07 0.0
Latency with memory destination operand: test [m32],r32 Latency 0.98 0.0
popcnt r64,r64 Throughput 0.98 0.0
Latency with memory destination operand: sbb [m32],r32 Latency 4.9 0.0
Throughput with memory source operand: test r8,[m8] Throughput 1.07 0.0
test r8,r8 Throughput 0.52 0.0
Throughput with memory source operand: adc r8,[m8] Throughput 1.0 0.0
Latency with memory destination operand: xchg [m64],r64 Latency 17.62 2.59
mov r64,r64 Throughput 0.5 0.0
cmp r8, i Throughput 0.5 0.0
or r8high, i Throughput 0.64 0.0
Latency with memory destination operand: or [m32],r32 Latency 5.31 0.0
Throughput with memory source operand: sbb r8,[m8] Throughput 0.98 0.0
Latency with memory operand: mov [m32], i Latency 1.84 0.0
Throughput with memory destination operand: cmp [m16],r16 Throughput 0.98 0.0
and r64, i Throughput 0.59 0.0
Throughput with memory source operand: cmp r16,[m16] Throughput 0.98 0.0
xor r16,r16 Throughput 0.54 0.0
xor r8, i Throughput 0.59 0.0
Latency with memory destination operand: xor [m16],r16 Latency 5.43 0.15
cwd Latency 0.5 0.0
Throughput with memory source operand: bsf Throughput 2.13 0.23
test r32,r32 Throughput 0.5 0.0
and r32, i Throughput 0.59 0.0
Latency with memory operand: sbb [m32], i Latency 5.31 0.0
Throughput with memory operand: or [m32], i Throughput 2.13 0.0
xor eax,eax / cqo Throughput 0.54 0.0
Throughput with memory source operand: xor r8,[m8] Throughput 0.98 0.0
popcnt r16,r16 Throughput 1.07 0.0
Throughput with memory source operand: popcnt Throughput 0.99 0.0
add r64,r64 Throughput 0.54 0.0
xchg r64,r64 Throughput 1.48 0.0
Throughput with memory source operand: sub r16,[m16] Throughput 0.98 0.0
cmp r64,r64 Throughput 0.5 0.0
bsr r32,r32 Throughput 1.96 0.0
prefetcht2 [m] Throughput 1.3 0.0
and r16, i Throughput 0.59 0.0
or r64, i Throughput 0.59 0.0
Latency with memory destination operand: xchg [m16],r16 Latency 18.03 0.0
Latency with memory destination operand: test [m8],r8 Latency 0.98 0.0
Throughput with memory operand: xor [m32], i Throughput 1.97 0.0
Latency with memory destination operand: cmp [m32],r32 Latency 0.98 0.0
lfence Latency 15.85 0.0
Throughput with INDIR address mode: mov [m32], r32 Throughput 1.65 0.0
Throughput with memory source operand: add r64,[m64] Throughput 0.99 0.0
Latency with memory destination operand: cmp [m16],r16 Latency 0.98 0.0
cmove r16,r16 Throughput 0.51 0.0
Throughput with memory source operand: xor r64,[m64] Throughput 0.99 0.0
test r8, i Throughput 0.5 0.0
Throughput with memory destination operand: mov [m64],r64 Throughput 1.79 0.0
dec r64 Throughput 0.64 0.0
Latency with memory destination operand: sbb [m64],r64 Latency 4.9 0.0
Latency with memory destination operand: mov [m8],r8 Latency 1.65 0.0
Throughput with memory source operand: and r8,[m8] Throughput 1.07 0.0
sub r8, i Throughput 0.59 0.0
sub r64, i Throughput 0.59 0.0
xor r16, i Throughput 0.59 0.0
bsr r16,r16 Throughput 1.96 0.0
Throughput with RIP address mode: mov [m32], r32 Throughput 1.79 0.0
sete r8 / neg r8 Latency 1.62 0.0
imul r64,r64 Throughput 2.12 0.0
sbb r8, i Throughput 1.06 0.0
Throughput with memory destination operand: cmp [m32],r32 Throughput 0.98 0.0
mov r32,r32 Throughput 0.5 0.0
xor r64, i Throughput 0.64 0.0
Throughput with memory source operand: and r16,[m16] Throughput 0.98 0.0
Throughput with memory destination operand: adc [m64],r64 Throughput 1.76 0.0
xor r64,r64 Throughput 0.54 0.0
test r64,r64 Throughput 0.5 0.0
sub r32, i Throughput 0.59 0.0
not r64 Throughput 0.64 0.0
Throughput with memory source operand: test r64,[m64] Throughput 1.06 0.0
prefetchnta [m] Throughput 1.3 0.0
Throughput with memory source operand: xor r16,[m16] Throughput 0.98 0.0
add r64, i Throughput 0.64 0.0
Latency with memory destination operand: or [m16],r16 Latency 5.31 0.0
Throughput with memory operand: sbb [m32], i Throughput 1.96 0.0
Latency with memory destination operand: cmp [m8],r8 Latency 0.98 0.0
Throughput with memory destination operand: mov [m16],r16 Throughput 1.79 0.0
neg r64 Throughput 0.64 0.0
Throughput with memory destination operand: and [m8],r8 Throughput 1.96 0.0
Throughput with memory operand: sub [m32], i Throughput 1.96 0.0
Latency with memory operand: neg [m32] Latency 5.3 0.0
cmp r8,r8 Throughput 0.54 0.0
Throughput with memory destination operand: adc [m32],r32 Throughput 1.92 0.0
cqo Latency 0.54 0.0
Throughput with memory source operand: mov r32,[m32] Throughput 1.41 0.0
sbb r16, i Throughput 0.98 0.0
and r8high, i Throughput 0.59 0.0
and r8,r8 Throughput 0.5 0.0
Throughput with memory source operand: and r32,[m32] Throughput 0.98 0.0
Throughput with memory destination operand: test [m64],r64 Throughput 0.99 0.0
test r64, i Throughput 0.55 0.0
Latency with memory destination operand: sub [m16],r16 Latency 4.9 0.0
Throughput with memory source operand: test r16,[m16] Throughput 1.07 0.0
xchg r16,r16 Throughput 1.9 0.0
Throughput with memory source operand: or r32,[m32] Throughput 0.98 0.0
Throughput with memory operand: inc [m32] Throughput 1.97 0.0
Throughput with memory destination operand: xor [m64],r64 Throughput 1.64 0.0
test r16,r16 Throughput 0.5 0.0
Throughput with memory source operand: sub r8,[m8] Throughput 0.98 0.0
cmp r32, i Throughput 0.5 0.0
Throughput with memory destination operand: sbb [m32],r32 Throughput 1.92 0.0
Latency with memory operand: or [m32], i Latency 5.31 0.0
add r8, i Throughput 0.64 0.0
add r32, i Throughput 0.64 0.0
Latency with memory destination operand: xor [m64],r64 Latency 5.31 0.0
Throughput with memory destination operand: sbb [m16],r16 Throughput 1.91 0.0
Throughput with memory destination operand: or [m64],r64 Throughput 1.77 0.0
sub r64,r64 Throughput 0.5 0.0
Throughput with memory operand: adc [m32], i Throughput 2.13 0.0
Throughput with memory destination operand: sub [m64],r64 Throughput 1.63 0.0
Throughput with memory source operand: and r64,[m64] Throughput 0.98 0.0
sete r8 Throughput 0.98 0.0
or r32, i Throughput 0.59 0.0
Latency with memory destination operand: mov [m16],r16 Latency 2.12 0.99
Throughput with memory operand: not [m32] Throughput 1.96 0.0
sbb r32, i Throughput 0.98 0.0
Throughput with memory source operand: or r16,[m16] Throughput 0.98 0.0
Throughput with memory destination operand: xchg [m8],r8 Throughput 18.04 0.0
Throughput with memory destination operand: add [m64],r64 Throughput 1.77 0.0
sub r16,r16 Throughput 0.54 0.0
Latency with memory destination operand: xor [m32],r32 Latency 5.3 0.0
Throughput with memory destination operand: sbb [m64],r64 Throughput 1.77 0.0
not r8 Throughput 0.64 0.0
inc r32 Throughput 0.59 0.0
std Latency 3.18 0.0
Latency with memory destination operand: adc [m64],r64 Latency 4.9 0.0
bsr r64,r64 Throughput 2.13 0.0
test r16, i Throughput 1.29 0.0
sub r16, i Throughput 0.59 0.0
mfence Latency 15.85 0.0
stc Latency 3.03 0.0
or r16, i Throughput 0.64 0.0
sbb r64,r64 Throughput 0.98 0.0
Throughput with memory source operand: test r32,[m32] Throughput 1.09 0.01
cmove r32,r32 Throughput 0.5 0.0
test r8high, i Throughput 0.5 0.0
Throughput with memory source operand: xchg r8,[m8] Throughput 16.64 0.0
adc r64,r64 Throughput 0.98 0.0
Latency with memory operand: xor [m32], i Latency 5.31 0.0
Throughput with memory destination operand: and [m32],r32 Throughput 1.92 0.0
Throughput with memory source operand: mov r16,[m16] Throughput 1.07 0.0
Throughput with memory destination operand: mov [m8],r8 Throughput 1.66 0.0
Throughput with memory destination operand: add [m16],r16 Throughput 1.91 0.0
Throughput with memory destination operand: sub [m16],r16 Throughput 1.77 0.0
Throughput with memory source operand: sbb r32,[m32] Throughput 0.98 0.0
Throughput with memory destination operand: xchg [m64],r64 Throughput 16.65 0.0
Throughput with memory destination operand: adc [m8],r8 Throughput 2.12 0.0
xor r32,r32 Throughput 0.54 0.0
mov r8high, i Throughput 0.64 0.0
Throughput with memory source operand: sbb r16,[m16] Throughput 0.98 0.0
sbb r8,r8 Throughput 1.06 0.0
Throughput with memory operand: and [m32], i Throughput 1.97 0.0
Throughput with memory destination operand: add [m8],r8 Throughput 1.96 0.0
imul r32,r32 Throughput 1.12 0.0
sub r8,r8 Throughput 0.55 0.0
mov r16, i Throughput 1.34 0.0
Throughput with ABS32 address mode: mov r32, [m32] Throughput 1.3 0.0
bsf r16,r16 Throughput 2.36 0.49
sete r8h Throughput 1.06 0.0
Latency with memory destination operand: test [m16],r16 Latency 0.98 0.0
Throughput with memory destination operand: xor [m8],r8 Throughput 1.96 0.0
Latency with memory destination operand: mov [m32],r32 Latency 1.74 0.01
cbw Latency 1.06 0.0
Throughput with memory source operand: cmp r32,[m32] Throughput 1.09 0.0
Throughput with ABS32 address mode: mov [m32], r32 Throughput 1.72 0.04
mov r64, i Throughput 0.59 0.0
prefetcht0 [m] Throughput 1.3 0.0
Throughput with memory destination operand: test [m16],r16 Throughput 1.07 0.0
Throughput with memory source operand: mov r8,[m8] Throughput 1.07 0.0
Throughput with memory destination operand: test [m8],r8 Throughput 1.06 0.0
or r64,r64 Throughput 0.54 0.0
cmp r16, i Throughput 0.5 0.0
cdqe Latency 0.98 0.0
and r32,r32 Throughput 0.5 0.0
Throughput with memory destination operand: and [m16],r16 Throughput 1.77 0.0
or r16,r16 Throughput 0.54 0.0
Latency with memory destination operand: adc [m32],r32 Latency 5.46 1.64
Latency with memory operand: and [m32], i Latency 4.9 0.0
Throughput with memory destination operand: cmp [m8],r8 Throughput 1.07 0.0
and r8, i Throughput 0.59 0.0
Latency with memory destination operand: mov [m64],r64 Latency 1.65 0.0
Throughput with INDIR address mode: mov r32, [m32] Throughput 1.41 0.0
mov r32, i Throughput 0.59 0.0
cmp r16,r16 Throughput 0.54 0.0
inc r8 Throughput 0.59 0.0
xor r32, i Throughput 0.59 0.0
Throughput with memory source operand: cmove Throughput 0.98 0.0
sbb r8high, i Throughput 0.98 0.0
Throughput with memory destination operand: sbb [m8],r8 Throughput 2.12 0.0
bsf r32,r32 Throughput 1.96 0.0
inc r16 Throughput 0.59 0.0
adc r64, i Throughput 0.98 0.0
bswap r64 Throughput 1.07 0.0
Latency with memory destination operand: sub [m64],r64 Latency 4.9 0.0
not r32 Throughput 0.59 0.0
xchg r8,r8 Throughput 2.6 0.0
sete, [m8] Throughput 1.69 0.0
Throughput with memory operand: dec [m32] Throughput 1.96 0.0
cmove r64,r64 Throughput 0.51 0.0
Throughput with ABS64 address mode: mov [m32], r32 Throughput 1.65 0.0
Latency with memory destination operand: sub [m8],r8 Latency 4.9 0.0
Latency with memory operand: inc [m32] Latency 4.9 0.0
Throughput with memory destination operand: sub [m8],r8 Throughput 2.13 0.0
xor eax,eax / cbw Throughput 0.5 0.0
Throughput with memory source operand: mov r64,[m64] Throughput 1.3 0.0
or r32,r32 Throughput 0.54 0.0
Latency with memory destination operand: and [m32],r32 Latency 5.3 0.0
cwde Latency 0.98 0.0
sbb r16,r16 Throughput 1.07 0.0
Throughput with memory source operand: adc r64,[m64] Throughput 1.07 0.0
Latency with memory destination operand: sbb [m8],r8 Latency 5.61 0.82
bsf r64,r64 Throughput 1.96 0.0
Latency with memory destination operand: xor [m8],r8 Latency 5.01 0.11
Throughput with memory source operand: cmp r8,[m8] Throughput 0.98 0.0
neg r16 Throughput 0.99 0.0
Latency with memory destination operand: or [m64],r64 Latency 5.31 0.0
sbb r64, i Throughput 0.98 0.0
test r32, i Throughput 0.5 0.0
Latency with memory operand: dec [m32] Latency 4.9 0.0
Throughput with memory destination operand: cmp [m64],r64 Throughput 0.98 0.0
Latency with memory destination operand: and [m16],r16 Latency 5.31 0.0
Latency with memory destination operand: sub [m32],r32 Latency 4.9 0.0
Throughput with memory destination operand: or [m8],r8 Throughput 1.96 0.0
and r64,r64 Throughput 0.54 0.0
add r8high, i Throughput 0.64 0.0
Throughput with memory destination operand: adc [m16],r16 Throughput 1.91 0.0
Throughput with memory destination operand: mov [m32],r32 Throughput 1.79 0.0
prefetcht1 [m] Throughput 1.3 0.0
sub r32,r32 Throughput 0.5 0.0
Throughput with memory source operand: bsr Throughput 1.96 0.0
Throughput with memory source operand: xor r32,[m32] Throughput 0.98 0.0
Throughput with memory source operand: pmovzxwd r128,[m128] Throughput 0.99 0.0
punpcklqdq r128,r128 Throughput 0.98 0.0
pshufw r64,r64,i Throughput 0.98 0.0
Throughput with memory source operand: palignr r64,[m64],i Throughput 1.07 0.0
psrad r128,i Throughput 1.07 0.0
Throughput with memory source operand: punpckldq r64,[m64] Throughput 0.99 0.0
punpckldq r64,r64 Throughput 0.98 0.0
Throughput with memory source operand: packuswb r64,[m64] Throughput 0.98 0.0
paddd r64,r64 Throughput 1.06 0.0
Throughput with memory source operand: pmovsxbw r128,[m128] Throughput 0.99 0.0
Throughput with memory source operand: pavgb r64,[m64] Throughput 1.2 0.1
Throughput with memory source operand: pmaddwd r128,[m128] Throughput 1.96 0.0
Throughput with memory source operand: packssdw r128,[m128] Throughput 0.99 0.0
pcmpgtd r64,r64 Throughput 0.98 0.0
paddusb r128,r128 Throughput 0.98 0.0
packssdw r64,r64 Throughput 1.16 0.0
psllq r64,r64 Throughput 1.06 0.0
Throughput with memory source operand: pmullw r128,[m128] Throughput 1.23 0.0
Throughput with memory source operand: pand r64,[m64] Throughput 0.98 0.0
pcmpgtw r64,r64 Throughput 1.06 0.0
psllw r64,i Throughput 1.06 0.0
Throughput with memory source operand: psllw r64,[m64] Throughput 0.98 0.0
Throughput with memory source operand: pmulhuw r128,[m128] Throughput 1.23 0.0
Throughput with memory source operand: pcmpgtd r64,[m64] Throughput 0.98 0.0
pminub r64,r64 Throughput 1.06 0.0
Throughput with memory source operand: pmovsxdq r128,[m128] Throughput 1.07 0.0
pmaxsw r128,r128 Throughput 0.98 0.0
Throughput with memory source operand: pshuflw r128,[m128] Throughput 1.07 0.0
pandn r64,r64 Throughput 0.98 0.0
Throughput with memory source operand: phsubd r128,[m128] Throughput 4.9 0.0
pmaddwd r64,r64 Throughput 1.96 0.0
pmaddubsw r64,r64 Throughput 7.83 0.0
mpsadbw r128,r128 Throughput 0.99 0.0
Throughput with memory source operand: punpckhwd r128,[m128] Throughput 0.99 0.0
Throughput with memory source operand: psraw r128,[m128] Throughput 1.07 0.0
pavgw r128,r128 Throughput 1.06 0.0
pblendw r128,r128 Throughput 0.98 0.0
pshufb r64,r64 Throughput 0.98 0.0
Throughput with memory source operand: mpsadbw r128,[m128] Throughput 0.99 0.0
Throughput with memory source operand: psllq r64,[m64] Throughput 0.99 0.0
Throughput with memory source operand: pshufhw r128,[m128] Throughput 1.21 0.1
Throughput with memory source operand: pmuldq r128,[m128] Throughput 1.23 0.0
pcmpeqw r64,r64 Throughput 1.06 0.0
Throughput with memory source operand: pandn r128,[m128] Throughput 1.07 0.0
pminub r128,r128 Throughput 1.41 1.06
Throughput with memory source operand: pmaxuw r128,[m128] Throughput 1.07 0.0
Throughput with memory source operand: pmuludq r128,[m128] Throughput 1.23 0.0
Throughput with memory source operand: movdqa r128,[m128] Throughput 1.41 0.0
pabsb r64,r64 Throughput 0.98 0.0
psignw r64,r64 Throughput 0.98 0.0
movdqa r128,r128 Throughput 1.07 0.0
phaddsw r128,r128 Throughput 3.18 0.0
pmullw r128,r128 Throughput 0.98 0.0
psadbw r64,r64 Throughput 1.07 0.0
Throughput with memory source operand: psignw r64,[m64] Throughput 0.99 0.0
Throughput with memory source operand: paddw r128,[m128] Throughput 1.07 0.0
punpckhdq r64,r64 Throughput 0.98 0.0
Throughput with memory source operand: pminsb r128,[m128] Throughput 0.99 0.0
Throughput with memory source operand: pminub r64,[m64] Throughput 0.98 0.0
pmuludq r128,r128 Throughput 1.06 0.0
psllq r128,r128 Throughput 0.98 0.0
Throughput with memory source operand: palignr r128,[m128] Throughput 0.99 0.0
pmaxud r128,r128 Throughput 1.07 0.0
pmuldq r128,r128 Throughput 0.98 0.0
Throughput with memory source operand: paddusb r64,[m64] Throughput 0.98 0.0
Throughput with memory source operand: punpcklqdq r128,[m128] Throughput 1.55 1.15
psrad r64,i Throughput 1.06 0.0
pavgw r64,r64 Throughput 1.06 0.0
Throughput with memory source operand: paddd r64,[m64] Throughput 0.98 0.0
Throughput with memory source operand: pcmpeqb r128,[m128] Throughput 1.07 0.0
Throughput with memory source operand: pmaddubsw r64,[m64] Throughput 8.81 0.0
pmuludq r64,r64 Throughput 0.99 0.0
punpckhwd r64,r64 Throughput 0.98 0.0
pcmpeqd r128,r128 Throughput 1.06 0.0
Throughput with memory source operand: pshufb r128,[m128] Throughput 1.07 0.0
pmovzxwd r128,r128 Throughput 0.98 0.0
pmaxsw r64,r64 Throughput 0.98 0.0
psrlq r64,r64 Throughput 1.06 0.0
Throughput with memory source operand: psllq r128,[m128] Throughput 0.99 0.0
Throughput with memory source operand: punpckhdq r64,[m64] Throughput 0.98 0.0
pabsb r128,r128 Throughput 0.98 0.0
Throughput with memory source operand: psllw r128,[m128] Throughput 0.99 0.0
Throughput with memory source operand: paddq r64,[m64] Throughput 0.98 0.0
Throughput with memory source operand: pblendw r128,[m128] Throughput 0.99 0.0
pmaddubsw r128,r128 Throughput 8.62 0.15
Throughput with memory source operand: pcmpeqw r128,[m128] Throughput 0.99 0.0
Throughput with memory source operand: pmaddubsw r128,[m128] Throughput 9.54 0.0
Throughput with memory source operand: punpckhdq r128,[m128] Throughput 0.99 0.0
paddw r64,r64 Throughput 1.06 0.0
Throughput with memory source operand: pcmpgtd r128,[m128] Throughput 0.99 0.0
Throughput with memory source operand: pabsb r128,[m128] Throughput 0.99 0.0
Throughput with memory source operand: psignw r128,[m128] Throughput 0.99 0.0
psrlq r128,i Throughput 1.07 0.0
psadbw r128,r128 Throughput 0.98 0.0
Throughput with memory source operand: paddq r128,[m128] Throughput 0.99 0.0
packssdw r128,r128 Throughput 1.07 0.0
Throughput with memory source operand: pcmpeqq r128,[m128] Throughput 1.07 0.0
punpckhwd r128,r128 Throughput 0.98 0.0
psraw r64,r64 Throughput 0.98 0.0
Throughput with memory source operand: pcmpeqb r64,[m64] Throughput 0.98 0.0
Throughput with memory source operand: pmaxsw r128,[m128] Throughput 0.99 0.0
pmullw r64,r64 Throughput 1.06 0.0
paddw r128,r128 Throughput 0.98 0.0
Throughput with memory source operand: pandn r64,[m64] Throughput 0.98 0.0
Throughput with memory source operand: punpckhqdq r128,[m128] Throughput 0.99 0.0
pand r64,r64 Throughput 0.98 0.0