-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathreferences.txt
182 lines (122 loc) · 7.02 KB
/
references.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
UICA Online Microarchitecture Analysis Tool
https://uica.uops.info/
Perf Wiki Tutorial
https://perf.wiki.kernel.org/index.php/Tutorial
Denis Bakhvalov`s article : Visualizing Performance-Critical Dependency Chains
https://easyperf.net/blog/2022/05/11/Visualizing-Performance-Critical-Dependency-Chains
Intel`s "How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures" docuementation
https://www.yumpu.com/en/document/view/27787317/how-to-benchmark-code-execution-times-on-intel-ia-32-and-ia-64-
ACPI on Wikipedia
https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface
AMD Turbocore page
https://en.wikipedia.org/wiki/AMD_Turbo_Core
AMD cache injection
https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/white-papers/58725.pdf
AMD Ryzen Processor Software optimisation by Ken Mitchell , GDC2022
https://gpuopen.com/gdc-presentations/2022/GDC_AMD_Ryzen_Processor_Software_Optimization.pdf
SIMD : GCC auto vectorisation
https://gcc.gnu.org/projects/tree-ssa/vectorization.html
Daniel Lemire`s article : AVX-512: when and how to use these new instructions
https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/
SIMD JSON on Github
https://github.com/simdjson/simdjson
Memory disambiguation on Wikipedia
https://en.wikipedia.org/wiki/Memory_disambiguation
Memory disambiguation - Store to load forwarding on Wikipedia
https://en.wikipedia.org/wiki/Memory_disambiguation#Store_to_load_forwarding
Load Hit Store on Wikipedia
https://en.wikipedia.org/wiki/Load-Hit-Store
__restrict__ / Restricting Pointer Aliasing on GCC docs site
https://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html
Microsoft's Xbox 360, Sony's PS3 - A Hardware Discussion on Anandtech
https://www.anandtech.com/show/1719/5
Elan Ruskin`s article : LOAD-HIT-STORES AND THE __RESTRICT KEYWORD
https://web.archive.org/web/20210120214304/http://assemblyrequired.crashworks.org/load-hit-stores-and-the-__restrict-keyword/
Subnormal numbers on Wikipedia
https://en.wikipedia.org/wiki/Subnormal_number
Bruce Dawson`s article : That’s Not Normal–the Performance of Odd Floats
https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/
Intel Intrinsic`s Guide
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
Intel AVX10 Specs
https://www.intel.com/content/www/us/en/content-details/828964/intel-advanced-vector-extensions-10-1-intel-avx10-1-architecture-specification.html
Intel AES on Wikipedia
https://en.wikipedia.org/wiki/AES_instruction_set
Intel Vector Neural Network Instructions
https://en.wikipedia.org/wiki/AVX-512#VNNI
Perceptrons on Wikipedia
https://en.wikipedia.org/wiki/Perceptron
Dynamic Branch Prediction with Perceptrons , by Daniel A. Jimenez , Calvin Lin
https://www.cs.utexas.edu/~lin/papers/hpca01.pdf
Kernel.org page : The kernel’s command-line parameters to disable security mitigations
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
Controlling the Performance Impact of Microcode and Security Patches for CVE-2017-5754 CVE-2017-5715 and CVE-2017-5753 using Red Hat Enterprise Linux Tunables
https://access.redhat.com/articles/3311301
Meltdown paper
https://meltdownattack.com/meltdown.pdf
Spectre paper
https://spectreattack.com/spectre.pdf
Marek Majkovski`s article : Branch predictor: How many "if"s are too many? Including x86 and M1 benchmarks!
https://blog.cloudflare.com/branch-predictor/
Broadwell microarchitecture
https://en.wikipedia.org/wiki/Broadwell_(microarchitecture)
Cache prefetching on Wikipedia
https://en.wikipedia.org/wiki/Cache_prefetching
Intel® Data Direct I/O Technology (Intel® DDIO): A Primer
https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf
Raymond Chen's article : Why is address space allocation granularity 64KB?
https://devblogs.microsoft.com/oldnewthing/20031008-00/?p=42223
Why DDR5 is the Industry’s Powerful Next-gen Memory?
https://news.skhynix.com/why-ddr5-is-the-industrys-powerful-next-gen-memory/
DIMM image source
https://pixabay.com/vectors/dimm-ram-memory-ram-computer-23265/
Intel hybrid architecture
https://www.intel.com/content/www/us/en/developer/articles/technical/hybrid-architecture.html
Intel Alder Lake e-cores sharing L2 cache on Anandtech
https://en.wikipedia.org/wiki/Alder_Lake
AMD Phoenix2 hybrid cpus
https://www.tomshardware.com/news/amd-phoenix-2-review-evaluates-zen-4-zen-4c-performance
AMD CCX
https://www.tomshardware.com/reviews/amd-ccx-definition-cpu-core-explained,6338.html
MESI protocol on Wikipedia
https://en.wikipedia.org/wiki/MESI_protocol
Intel MESIF protocol on Wikipedia
https://en.wikipedia.org/wiki/MESIF_protocol
AMD MOESI protocol on Wikipedia
https://en.wikipedia.org/wiki/MOESI_protocol
Erik Rigtorp`s article : Optimising a ring buffer for throughput
https://rigtorp.se/ringbuffer/
Jeff Preshing`s article : Memory Ordering at Compile Time
https://preshing.com/20120625/memory-ordering-at-compile-time/
Detecting and handling split locks (in Linux kernel) on lwn.net
https://lwn.net/Articles/790464/
CAS / Compare and swap on Wikipedia
https://en.wikipedia.org/wiki/Compare-and-swap
Test-And-Set
https://en.wikipedia.org/wiki/Test-and-set
Intel sticks another nail in the coffin of TSX with feature-disabling microcode update
https://www.theregister.com/2021/06/29/intel_tsx_disabled/
AMD`s Advanced Syncronisation Facility ( Transactional memory ) on Wikipedia
https://en.wikipedia.org/wiki/Advanced_Synchronization_Facility
Intel Cache Allocation Technology page
https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-cache-allocation-technology.html
Intel Code and Data Prioritisation page
https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-code-and-data-prioritization-with-usage-models.html
AMD64 Technology Platform Quality of Service Extensions
https://kib.kiev.ua/x86docs/AMD/MISC/56375_1.00_PUB.pdf
Intel Memory Bandwidth Allocation page
https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-memory-bandwidth-allocation.html
lstopo on linux.die.net
https://linux.die.net/man/1/lstopo
Intel memory latency checker
https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html
Dell`s AMD NUMA node per socket article
https://infohub.delltechnologies.com/l/cpu-best-practices-3/poweredge-numa-nodes-per-socket-1#:~:text=AMD%20servers%20provide%20the%20ability,bank%20into%20two%20equal%20parts.
Intel Vtune
https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html
Andi Kleen`s PMU tools / Toplev on Github
https://github.com/andikleen/pmu-tools
How TMA Addresses Challenges in Modern Servers and Enhancements Coming in IceLake by Ahmad Yasin
https://dyninst.github.io/scalable_tools_workshop/petascale2018/assets/slides/TMA%20addressing%20challenges%20in%20Icelake%20-%20Ahmad%20Yasin.pdf
Infographics: Operation Costs in CPU Clock Cycles on IT Hare
http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/