Skip to content

Commit 0447528

Browse files
committed
update mb
1 parent cdb4d5a commit 0447528

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

memory_barrier.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,16 +151,20 @@ Store Buffer:
151151

152152
CPU 存在 store buffer 的直接影响是,当 CPU 提交一个写操作时,这个写操作不会立即写入到 cache 中。因而,无论什么时候 CPU 需要从 cache line 中读取,都需要先扫描它自己的 store buffer 来确认是否存在相同的 line,因为有可能当前 CPU 在这次操作之前曾经写入过 cache,但该数据还没有被刷入过 cache(之前的写操作还在 store buffer 中等待)。需要注意的是,虽然 CPU 可以读取其之前写入到 store buffer 中的值,但其它 CPU 并不能在该 CPU 将 store buffer 中的内容 flush 到 cache 之前看到这些值。即 store buffer 是不能跨核心访问的,CPU 核心看不到其它核心的 store buffer。
153153

154-
Invalidate Queues
154+
Invalidate Queues
155155

156156
为了处理 invalidation 消息,CPU 实现了 invalidate queue,借以处理新达到的 invalidate 请求,在这些请求到达时,可以马上进行响应,但可以不马上处理。取而代之的,invalidation 消息只是会被推进一个 invalidation 队列,并在之后尽快处理(但不是马上)。因此,CPU 可能并不知道在它 cache 里的某个 cache line 是 invalid 状态的,因为 invalidation 队列包含有收到但还没有处理的 invalidation 消息,这是因为 CPU 和 invalidation 队列从物理上来讲是位于 cache 的两侧的。
157157

158158
从结果上来讲,memory barrier 是必须的。一个 store barrier 会把 store buffer flush 掉,确保所有的写操作都被应用到 CPU 的 cache。一个 read barrier 会把 invalidation queue flush 掉,也就确保了其它 CPU 的写入对执行 flush 操作的当前这个 CPU 可见。再进一步,MMU 没有办法扫描 store buffer,会导致类似的问题。这种效果对于单线程处理器来说已经是会发生的了。
159159

160160
## lfence, sfence, mfence
161161

162+
https://stackoverflow.com/questions/27595595/when-are-x86-lfence-sfence-and-mfence-instructions-required
163+
162164
## acquire/release 抽象
163165

166+
https://preshing.com/20130922/acquire-and-release-fences/
167+
164168
## write barrier, read barrier
165169

166170
## memory order
@@ -169,6 +173,8 @@ std::memory_order specifies how memory accesses, including regular, non-atomic m
169173

170174
The default behavior of all atomic operations in the library provides for sequentially consistent ordering (see discussion below). That default can hurt performance, but the library's atomic operations can be given an additional std::memory_order argument to specify the exact constraints, beyond atomicity, that the compiler and processor must enforce for that operation.
171175

176+
## sequential consistency
177+
172178
## cache coherency vs memory consistency
173179

174180
The MESI protocol makes the memory caches effectively invisible. This means that multithreaded programs don't have to worry about a core reading stale data from them or two cores writing to different parts of a cache line and getting half of one write and half of the other sent to main memory.
@@ -270,6 +276,7 @@ https://webcourse.cs.technion.ac.il/234267/Spring2016/ho/WCFiles/tirgul%205%20me
270276
https://www.scss.tcd.ie/Jeremy.Jones/VivioJS/caches/MESIHelp.htm
271277

272278
http://15418.courses.cs.cmu.edu/spring2017/lectures
279+
273280
https://software.intel.com/en-us/articles/how-memory-is-accessed
274281

275282
https://software.intel.com/en-us/articles/detect-and-avoid-memory-bottlenecks#_Move_Instructions_into

0 commit comments

Comments
 (0)