You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- in-place streaming with [Esoteric-Pull](https://doi.org/10.3390/computation10060092): eliminates redundant copy `B` of density distribution functions (DDFs) in memory; almost cuts memory demand in half and slightly increases performance due to implicit bounce-back boundaries; offers optimal memory access patterns for single-cell in-place streaming
97
97
-[decoupled arithmetic precision (FP32) and memory precision (FP32 or FP16S or FP16C)](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats): all arithmetic is done in FP32 for compatibility on all hardware, but DDFs in memory can be compressed to FP16S or FP16C: almost cuts memory demand in half again and almost doubles performance, without impacting overall accuracy for most setups
98
+
- <details><summary>only 8 flag bits per lattice point (can be used independently / at the same time)</summary>
99
+
100
+
-`TYPE_S` (stationary or moving) solid boundaries
101
+
-`TYPE_E` equilibrium boundaries (inflow/outflow)
102
+
-`TYPE_T` temperature boundaries
103
+
-`TYPE_F` free surface (fluid)
104
+
-`TYPE_I` free surface (interface)
105
+
-`TYPE_G` free surface (gas)
106
+
-`TYPE_X` remaining for custom use or further extensions
107
+
-`TYPE_Y` remaining for custom use or further extensions
108
+
109
+
</details>
98
110
- large cost saving: comparison of maximum single-GPU grid resolution for D3Q19 LBM
- [DDF-shifting](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats) and other algebraic optimization to minimize round-off error
0 commit comments