diff --git a/_collections/_portal_posts/2025-07-08-sycl-runtime-compilation.md b/_collections/_portal_posts/2025-07-08-sycl-runtime-compilation.md
index c533cf6..fdcd7f2 100644
--- a/_collections/_portal_posts/2025-07-08-sycl-runtime-compilation.md
+++ b/_collections/_portal_posts/2025-07-08-sycl-runtime-compilation.md
@@ -3,7 +3,7 @@ category: blogs
 date: '2025-07-08T09:00:00.0'
 hidden: false
 layout: portal/portal-article-view
-thumbnail: /assets/images/portal/article-images/2025-07-02-sycl-10-years/thumbnail.webp
+thumbnail: /assets/images/portal/article-images/2025-07-08-sycl-runtime-compilation/thumbnail.webp
 title: 'SYCL Runtime Compilation: A New Way to Specialise Kernels Using C++ Metaprogramming'
 user_id: 501121313
 ---
diff --git a/_collections/_portal_posts/2025-08-20-sycl-rtc.md b/_collections/_portal_posts/2025-08-20-sycl-rtc.md
new file mode 100644
index 0000000..5e2bd6c
--- /dev/null
+++ b/_collections/_portal_posts/2025-08-20-sycl-rtc.md
@@ -0,0 +1,310 @@
+---
+category: blogs
+date: '2025-08-20T02:00:00.0'
+hidden: false
+layout: portal/portal-article-view
+thumbnail: /assets/images/portal/article-images/2025-08-20-sycl-rtc/thumbnail.webp
+title: 'SYCL Runtime Compilation: A Look Behind the Scenes'
+user_id: 501121313
+---
+
+In our previous [blog post]({{ '/portal/blogs/2025/07/08/sycl-runtime-compilation' | relative_url }}), we showcased SYCL
+runtime compilation (**SYCL-RTC**) as a powerful new tool for kernel specialisation from a user perspective. In this
+article, we explore what actually happens when the application calls
+the [`sycl::ext::oneapi::experimental::build(...)`](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc#new-free-functions-to-create-and-build-kernel-bundles)
+function in the DPC++ implementation, and why we built it that way. It is also a tale of how **modular compiler
+technology** allowed us to deliver a faster and more secure in-memory compilation pipeline in just a couple months.
+
+For additional context, also check out our talk *Fast In-Memory Runtime Compilation of SYCL Code* at IWOCL
+2025: [Slides](https://www.iwocl.org/wp-content/uploads/iwocl-2025-julian-oppermann-fast-in-memory-runtime.pdf) [Video Recording](https://youtu.be/X9mS8xetZJY)
+
+## SYCL-RTC refresher
+
+SYCL-RTC means using
+the [`kernel_compiler`](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc)
+extension to wrap a SYCL source string comprised of kernel definitions in
+the [free-function syntax](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc)
+into a `kernel_bundle` in the `ext_oneapi_source` state, which is then compiled into `exectuable` state by the
+extension's `build(...)` function.
+
+```cpp
+#include <sycl/sycl.hpp>
+namespace syclexp = sycl::ext::oneapi::experimental;
+
+// ...
+
+std::string sycl_source = R"""(
+  #include <sycl/sycl.hpp>
+  
+  extern "C" SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((
+    sycl::ext::oneapi::experimental::nd_range_kernel<1>))
+  void vec_add(float* in1, float* in2, float* out){
+    size_t id = sycl::ext::oneapi::this_work_item::get_nd_item<1>()
+                .get_global_linear_id();
+    out[id] = in1[id] + in2[id];
+  }
+)""";
+
+sycl::queue q;
+
+auto source_bundle = syclexp::create_kernel_bundle_from_source(
+  q.get_context(), syclexp::source_language::sycl, sycl_source);
+
+// Read on to learn what happens in the next line!
+auto exec_bundle = syclexp::build(source_bundle);
+```
+
+But what happens in the background, and how does the SYCL runtime turn your SYCL code into an executable kernel when you
+call `build(...)`? These are the questions that we want to answer throughout the rest of this blog post.
+
+## An early prototype
+
+Our [first implementation](https://github.com/intel/llvm/blob/03cb2b25026f060149eb94c85b228e5b3a780588/sycl/source/detail/kernel_compiler/kernel_compiler_sycl.cpp#L254)
+of the `build(...)` function wrote the source string into a temporary file, invoked DPC++ on it in a special mode that
+dumped the device code to another file in SPIR-V format, and finally loaded that file back into the runtime, from where
+it was executed. The following figure shows the pipeline.
+
+![invoking.png]({{ '/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-1.png' | relative_url }})
+
+The DPC++ compiler is built on top of LLVM and its C/C++ frontend Clang. Internally, the compiler driver orchestrates
+the compilation of a SYCL code across multiple tools, connected by intermediate files in a temporary directory. While
+enthusiasts can find additional information [here](https://intel.github.io/llvm/design/CompilerAndRuntimeDesign.html),
+for understanding the rest of this post it is sufficient to know that device code is extracted and compiled by the
+SYCL-enabled frontend to LLVM's intermediate representation, linked with various device libraries (`llvm-link` in the
+figure), and post-processed by a mix of SYCL-specific transformation passes (`sycl-post-link`) before finally being
+translated into the target format, i.e. SPIR-V when targeting Intel devices (`llvm-spirv`).
+
+## The rationale for an in-memory compilation pipeline
+
+Invoking the DPC++ executable as outlined in the previous section worked reasonably well to implement the
+basic `kernel_compiler` extension, but we observed several shortcomings:
+
+- Functional completeness: Emitting a single SPIR-V file is sufficient for simple kernels, but more advanced device code
+  may result in multiple *device images* comprised of SPIR-V binaries and accompanying metadata (*runtime properties*)
+  that needs to be communicated to the runtime.
+- Robustness: Reading multiple dependent files from a temporary directory can be be fragile.
+- Performance: Multiple processes are launched by the compiler driver, and file I/O operations have a non-negligible
+  overhead.
+- Security: Reading executable code from disk is a security concern, and users of an RTC-enabled application may be
+  unaware that a compilation writing intermediate files is happening in the background.
+
+These challenges ultimately motivated the design of the **in-memory compilation pipeline** that is shown below and is
+now the default approach in DPC++ and the oneAPI product distribution since the 2025.2 release. This new approach
+leverages **modular compiler technology** to produce a faster, more feature-rich, more robust and safer implementation
+of the `kernel_compiler` extension.
+
+![invoking.png]({{ '/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-2.png' | relative_url }})
+
+The individual steps in the pipeline are now invoked programmatically via an API inside the same process, and
+intermediate results are passed along as objects in memory. Our implementation is part of the open-source DPCPP++
+repository [intel/llvm](https://github.com/intel/llvm) on GitHub, so you can find the code in
+the [`compileSYCL(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/RTC.cpp#L57)
+function. Let's dive into more detail the following sections!
+
+## Using the LibTooling API to compile the source string to an `llvm::Module`
+
+[LibTooling](https://clang.llvm.org/docs/LibTooling.html) is a high-level API to write standalone tools based on Clang,
+such as linters, refactoring tools or static analysers. To use it, one defines a *tool action* to run on a set of files
+in a *virtual filesystem overlay*, which the frontend then processes according to a *compilation command database*. The
+following figure outlines how we map a compilation request originating from the `kernel_compiler` extension to this API.
+
+![invoking.png]({{ '/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-3.png' | relative_url }})
+
+This might be a slightly unusual way to use of LibTooling, but we found it works great for SYCL-RTC. Let me show you how
+in this section by walking
+through [`jit_compiler::compileDeviceCode(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L418)
+function.
+
+### Step 1: Determine the path of the compiler installation
+
+To set up up working frontend invocation, we need to know where to find supplemental files such as the SYCL headers.
+Normally, these paths are determined relative to the compiler executable (e.g. `clang++` for the open-source DPC++),
+however in our case, the executable is actually the RTC-enabled application, which can reside in an arbitrary location.
+Instead, we use OS-specific logic
+inside [`getDPCPPRoot()`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L112)
+to determine the location of the shared library `sycl-jit.so` (or `.dll` on Windows) which contains the SYCL-RTC
+implementation. From its location, we can derive the compiler installation's root directory.
+
+### Step 2: Collect command-line arguments
+
+The next step is to collect the command-line arguments for the frontend invocation.
+The [`adjustArgs(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L320)
+function relies on Clang's option handling infrastructure to set the required options to enter the device compilation
+mode (`-fsycl-device-only`), set up the compiler environment, and select the target. Finally, any user-specified
+arguments passed via
+the [`build_options`](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc#new-properties-for-the-build-and-compile-functions)
+property are appended to the list of command-line arguments.
+
+### Step 3: Configure the `ClangTool`
+
+Once we know the required command-line arguments, we can set up the compilation command database and
+an [instance](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L433)
+of
+the [`ClangTool`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Tooling/Tooling.h#L317)
+class, which provides the entry point to the LibTooling interface. As we'll be translating only a single file containing
+the source string, we construct
+a [`FixedCompilationDatabase`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Tooling/CompilationDatabase.h#L154)
+relative to the current working directory.
+
+To implement the `kernel_compiler` extension cleanly, we need to capture all output (e.g. warnings and errors) from the
+frontend.
+The [`ClangDiagnosticsWrapper`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L274)
+class configures
+a [`TextDiagnosticsPrinter`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Frontend/TextDiagnosticPrinter.h#L27)
+to append all messages to a string maintained by our implementation to collect all output produced during the runtime
+compilation.
+
+The configuration of the `ClangTool` instance continues in
+the [`setupTool`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L353)
+function. First, we redirect all output to our diagnostics wrapper. Then,
+we [set up the overlay filesystem](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L361-L364)
+with a file named `rtc_<n>.cpp` (*n* is incremented for each use of the `kernel_compiler` extension's `build(...)`
+function) in the current directory with the contents of the source string. Each of the virtual header files that the
+application defined via
+the [`include_files`](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc#new-properties-for-the-create_kernel_bundle_from_source-function)
+property becomes also a file in the overlay filesystem, using the path specified in the property.
+
+The `ClangTool` class exposes so-called argument adjusters, which are intended to modify the command-line arguments
+coming from the compilation command database. We have
+to [clear](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L368)
+the default adjusters defined by the class, because one of them injects the `-fsyntax-only` flag, which would conflict
+with the `-fsycl-device-only` flag we need for SYCL-RTC. Finally,
+we [add](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L371)
+an argument adjuster ourselves to overwrite the name of executable in the invocation. Again, this is to help the correct
+detection of the environment, by making the invocation as similar as possible to a normal use of DPC++.
+
+### Step 4: Run an action
+
+The last step is to define
+a [`ToolAction`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Tooling/Tooling.h#L80)
+to be executed on the source files. Clang conveniently provides
+the [`EmitLLVMAction`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/CodeGen/CodeGenAction.h#L103),
+which runs the frontend up until the LLVM IR code generation, which is exactly what we need. However, LibTooling does
+not provides a helper to wrap it in a `ToolAction`, so we need to define and run our
+own [`GetLLVMModuleAction`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L241).
+
+We extracted common boilerplate code to configure
+a [`CompilerInstance`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Frontend/CompilerInstance.h#L81)
+in
+the [`RTCActionBase`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L176)
+class. Inside the `GetLLVMModuleAction`, we instantiate and execute the aforementioned `EmitLLVMAction`, and, in case
+the translation was
+successful, [obtains ownership](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L255)
+of the constructed `llvm::Module` from it.
+
+Finally, the call
+to [`Action.takeModule()`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L442)
+transfers ownership again to the caller of `compileDeviceCode`. Note that this simple mechanism works because we know
+that there is only a single compilation happening for every instance of the `ClangTool` and hence
+our `GetLLVMModuleAction` class.
+
+## Caching
+
+In our previous [blog post](https://codeplay.com/portal/blogs/2025/07/08/sycl-runtime-compilation), we noted that we
+implemented a persistent cache for the invocation of the frontend, which we observed to be the most expensive (in terms
+of runtime overhead) phase of our compilation pipeline. Let's have a closer look how the cache works.
+
+### Overall design
+
+We cache only the frontend invocation, meaning that after a successful translation, we store the LLVM IR module obtained
+via LibTooling on disk in the Bitcode format using built-in utilities. In case of a cache hit in a later runtime
+compilation, we load the module from disk and feed it into the device linking phase. The rationale for this design was
+that were no utilities to save and restore the linked and post-processed device images to disk at the time (
+the [SYCLBIN](https://intel.github.io/llvm/design/SYCLBINDesign.html) infrastructure was added later), and caching these
+steps would have resulted only in marginal further runtime savings.
+
+### Cache key considerations
+
+The main challenge is to define a robust cache key. Because code compiled via SYCL-RTC can `#include` header files
+defined via the `include_files` property as well as from the filesystem, e.g. `sycl.hpp` from the DPC++ installation or
+user libraries, it is not sufficient to look only at the source string. In order to make the cache as conservative as
+possible (cache collisions are unlikely but mathematically possible), we decided to compute a hash value of the
+*preprocessed* source string, i.e. with all `#include` directives resolved. We additionally compute a hash value of the
+rendered command-line arguments, and append it to the hash of the preprocessed source to obtain the final cache key.
+
+### Implementation notes
+
+The cache key computation is implemented in
+the [`jit_compiler::calculateHash(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L381)
+function. We are again relying on LibTooling to invoke the preprocessor - handily, Clang provides
+a [`PreprocessorFrontendAction`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Frontend/FrontendAction.h#L294)
+that we extend to tailor to our use-case.
+We choose [BLAKE3](https://en.wikipedia.org/wiki/BLAKE_(hash_function)) as the hash algorithm because its proven in
+similar contexts (most notably, [ccache](https://ccache.dev)) and available as a utility in the LLVM ecosystem. As the
+output is a byte array, we apply Base64 encoding to obtain a character string for use with the persistent cache.
+
+## Device library linking and SYCL-specific transformations
+
+With an LLVM IR module in hand, obtained either from the frontend or the cache, the next steps in the compilation
+pipeline are simple (at least for compiler folks 😉). As these steps are mostly SYCL-specific, we won't go into as much
+detail as in the previous section.
+
+The device library linking is done by
+the [`jit_compiler::linkDeviceLibraries(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L566)
+function. These libraries provide primitives for a variety of extra functionality, such as an extended set of math
+functions and support for `bfloat16` arithmetic, and are available as Bitcode files inside the DPC++ installation or the
+vendor toolchain, so we just use LLVM utilities to load them into memory and link them to the module representing the
+runtime-compiled kernels.
+
+For the SYCL-specific post-processing, implemented
+in [`jit_compiler::performPostLink(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L750),
+we can reuse modular analysis and transformation passes in
+the [`SYCLLowerIR`](https://github.com/intel/llvm/tree/sycl/llvm/lib/SYCLLowerIR) component. The main tasks for the
+post-processing passes is to split the device code module into smaller units (either as requested by the user, or
+required by the ESIMD mode), and to compute the properties that need to be passed to the SYCL runtime when the device
+images are loaded.
+
+## Translation to the target format
+
+The [final phase](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/RTC.cpp#L139)
+in the pipeline is to translate the LLVM IR modules resulting from the previous phase into a device-specific target
+format that can be handled by the runtime. For Intel CPUs and GPUs, that's binary SPIR-V. For AMD and NVIDIA GPUs, we
+emit AMDGCN and PTX assembly, respectively. Over time, we created our own set
+of [utilities](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/translation/Translation.h)
+to facilitate the translation. Internally, we dispatch the task to either the SPIR-V translator (a copy of which is
+maintained inside the DPC++ repository), or use vendor-specific backends that are part of LLVM to generate the
+third-party GPU code.
+
+## New hardware support
+
+You might be surprised that we talked about target formats for AMD and NVIDIA GPUs in the previous paragraph - in our
+IWOCL talk, we still said that SYCL-RTC only works on Intel hardware. Well, that is another exciting bit of news that we
+can share in this blog post: We have recently enabled support for SYCL-RTC on AMD and NVIDIA GPUs! The usage of
+the `kernel_compiler` extension remains the same for SYCL devices representing such a third-party GPU. The concrete GPU
+architecture is queried via the environment variable `SYCL_JIT_AMDGCN_PTX_TARGET_CPU` when executing the RTC-enabled
+application. For AMD GPUs, it is **mandatory** to set it. For NVIDIA GPUs, it is highly recommended to change it from
+the conservative default architecture (`sm_50`).
+
+```shell
+$ clang++ -fsycl myapp.cpp -o myapp
+$ SYCL_JIT_AMDGCN_PTX_TARGET_CPU=sm_90 ./myapp
+```
+
+A list of values that can be set as the target CPU can be found in
+the [documentation of the `-fsycl-targets=` option](https://intel.github.io/llvm/UsersManual.html#generic-options) (
+leave out the `amd_gpu_` and `nvidia_gpu_` prefixes).
+
+At the moment, the support is available in [daily builds](https://github.com/intel/llvm/releases) of the open-source
+version of DPC++.
+
+## Conclusion
+
+Thanks for reading on until here! As you can see, our approach to SYCL-RTC is built as a clever combination over
+existing components and utilities. To recap, we leverage:
+
+- DPC++'s SYCL-extended Clang version via LibTooling to preprocess and compile SYCL device code
+- LLVM utilities for reading and writing Bitcode files
+- LLVM utilities for computing the BLAKE3 hash value and a Base64 encoding of the result
+- SYCL-specific passes for post-processing
+- SPIR-V translator, as well as the LLVM backends for AMD and NVIDIA GPUs
+
+It is clear that without the prevalence of modular design practices in modern compilers, shipping the in-memory pipeline
+in a just a few months would have been impossible. An additional soft benefit is that reusing as much as possible of the
+existing codebase minimises differences and user surprises between "normal" and runtime compilation.
+
+## The team behind SYCL-RTC
+
+The work presented in this blog post is a joint effort by
+
+- Julian Oppermann, Lukas Sommer and Jakub Chlanda at Codeplay Software, and
+- Chris Perkins, Steffen Larsen, Alexey Sachkov and Greg Lueck at Intel.
diff --git a/assets/images/portal/article-images/2025-07-08-sycl-runtime-compilation/thumbnail.webp b/assets/images/portal/article-images/2025-07-08-sycl-runtime-compilation/thumbnail.webp
new file mode 100644
index 0000000..2a5701c
Binary files /dev/null and b/assets/images/portal/article-images/2025-07-08-sycl-runtime-compilation/thumbnail.webp differ
diff --git a/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-1.png b/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-1.png
new file mode 100644
index 0000000..4dc18ff
Binary files /dev/null and b/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-1.png differ
diff --git a/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-2.png b/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-2.png
new file mode 100644
index 0000000..8c7cca4
Binary files /dev/null and b/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-2.png differ
diff --git a/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-3.png b/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-3.png
new file mode 100644
index 0000000..aaa9b2d
Binary files /dev/null and b/assets/images/portal/article-images/2025-08-20-sycl-rtc/graph-3.png differ
diff --git a/assets/images/portal/article-images/2025-08-20-sycl-rtc/thumbnail.webp b/assets/images/portal/article-images/2025-08-20-sycl-rtc/thumbnail.webp
new file mode 100644
index 0000000..ae60626
Binary files /dev/null and b/assets/images/portal/article-images/2025-08-20-sycl-rtc/thumbnail.webp differ