Skip to content

Commit f561cb1

Browse files
committed
update mb
1 parent 58d110c commit f561cb1

File tree

1 file changed

+30
-21
lines changed

1 file changed

+30
-21
lines changed

memory_barrier.md

Lines changed: 30 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -122,26 +122,33 @@ Time SB 0.11
122122

123123
![](images/mesi_1.jpg)
124124

125-
Following are the different type of Processor requests and Bus side requests:
125+
下面是不同类型的处理器请求和总线侧的请求:
126126

127-
Processor Requests to Cache includes the following operations:
127+
处理器向 Cache 发请求包括下面这些操作:
128128

129-
PrRd: The processor requests to read a Cache block.
130-
PrWr: The processor requests to write a Cache block
131-
Bus side requests are the following:
129+
PrRd: 处理器请求读取一个 Cache block
130+
PrWr: 处理器请求写入一个 Cache block
132131

133-
BusRd: Snooped request that indicates there is a read request to a Cache block made by another processor
134-
BusRdX: Snooped request that indicates there is a write request to a Cache block made by another processor which doesn't already have the block.
135-
BusUpgr: Snooped request that indicates that there is a write request to a Cache block made by another processor but that processor already has that Cache block resident in its Cache.
136-
Flush: Snooped request that indicates that an entire cache block is written back to the main memory by another processor.
137-
FlushOpt: Snooped request that indicates that an entire cache block is posted on the bus in order to supply it to another processor(Cache to Cache transfers).
138-
(Such Cache to Cache transfers can reduce the read miss latency if the latency to bring the block from the main memory is more than from Cache to Cache transfers which is generally the case in bus based systems. But in multicore architectures, where the coherence is maintained at the level of L2 caches, there is on chip L3 cache, it may be faster to fetch the missed block from the L3 cache rather than from another L2)
132+
总线侧的请求包含:
139133

134+
BusRd: 监听到请求,表示当前有其它处理器正在发起对某个 Cache block 的读请求
135+
136+
BusRdX: 监听到请求,表示当前有其它处理器正在发起对某个其未拥有的 Cache block 的写请求
137+
138+
BusUpgr: 监听到请求,表示有另一个处理器正在发起对某 Cache block 的写请求,该处理器已经持有此 Cache block 块
139+
140+
Flush: 监听到请求,表示整个 cache 块已被另一个处理器写入到主存中
141+
142+
FlushOpt: 监听到请求,表示一个完整的 cache 块已经被发送到总线,以提供给另一个处理器使用(Cache 到 Cache 传数)
143+
144+
Cache 到 Cache 的传送可以降低 read miss 导致的延迟,如果不这样做,需要先将该 block 写回到主存,再读取,延迟会大大增加,在基于总线的系统中,这个结论是正确的。但在多核架构中,coherence 是在 L2 caches 这一级保证的,从 L3 中取可能要比从另一个 L2 中取还要快。
140145

141146
## 编译器导致乱序
142147

143148
## lfence, sfence, mfence
144149

150+
## acquire/release 抽象
151+
145152
## write barrier, read barrier
146153

147154
## memory order
@@ -160,18 +167,20 @@ On modern CPUs, the LOCK prefix locks the cache line so that the read-modify-wri
160167

161168
Unlocked increment:
162169

163-
Acquire cache line, shareable is fine. Read the value.
164-
Add one to the read value.
165-
Acquire cache line exclusive (if not already E or M) and lock it.
166-
Write the new value to the cache line.
167-
Change the cache line to modified and unlock it.
170+
1. Acquire cache line, shareable is fine. Read the value.
171+
2. Add one to the read value.
172+
3. Acquire cache line exclusive (if not already E or M) and lock it.
173+
4. Write the new value to the cache line.
174+
5. Change the cache line to modified and unlock it.
175+
168176
Locked increment:
169177

170-
Acquire cache line exclusive (if not already E or M) and lock it.
171-
Read value.
172-
Add one to it.
173-
Write the new value to the cache line.
174-
Change the cache line to modified and unlock it.
178+
1. Acquire cache line exclusive (if not already E or M) and lock it.
179+
2. Read value.
180+
3. Add one to it.
181+
4. Write the new value to the cache line.
182+
5. Change the cache line to modified and unlock it.
183+
175184
Notice the difference? In the unlocked increment, the cache line is only locked during the write memory operation, just like all writes. In the locked increment, the cache line is held across the entire instruction, all the way from the read operation to the write operation and including during the increment itself.
176185

177186
Also, some CPUs have things other than memory caches that can affect memory visibility. For example, some CPUs have a read prefetcher or a posted write buffer that can result in memory operations executing out of order. Where needed, a LOCK prefix (or equivalent functionality on other CPUs) will also do whatever needs to be done to handle memory operation ordering issues.

0 commit comments

Comments
 (0)