Compact ImageMap Format

A process' address space contains (among other things) the set of dynamically loaded images that have been mapped into that address space. When generating crash logs or symbolicating backtraces, we need to be able to capture and potentially store the list of images that has been loaded, as well as some of the attributes of those images, including each image's

Path
Build ID (aka UUID)
Base address
End-of-text address

Compact ImageMap Format (CIF) is a binary format for holding this information.

General Format

Compact ImageMap Format data is byte aligned and starts with an information byte:

   7   6   5   4   3   2   1   0
 ┌───────────────────────┬───────┐
 │ version               │ size  │
 └───────────────────────┴───────┘

The version field identifies the version of CIF that is in use; this document describes version 0. The size field is encoded as follows:

`size`	Machine word size
00	16-bit
01	32-bit
10	64-bit
11	Reserved

This is followed immediately by a field containing the name of the platform that generated this image map. This field consists of a single byte length followed by a UTF-8 string of that length.

After that is a field encoding the number of images in the image map; this field is encoded as a sequence of bytes, each holding seven bits of data, with the top bit clear for the final byte. The most significant byte is the first. e.g.

`count`	Encoding
0	00
1	01
127	7f
128	81 00
129	81 01
700	85 3c
1234	89 52
16384	81 80 00
65535	83 ff 7f
2097152	81 80 80 00

This in turn is followed by the list of images, stored in order of increasing base address. For each image, we start with a header byte:

   7   6   5   4   3   2   1   0
 ┌───┬───┬───────────┬───────────┐
 │ r │ 0 │ acount    │ ecount    │
 └───┴───┴───────────┴───────────┘

If r is set, then the base address is understood to be relative to the previously computed base address.

This byte is followed by acount + 1 bytes of base address, then ecount + 1 bytes of offset to the end of text.

Following this is an encoded count of bytes in the build ID, encoded using the 7-bit scheme we used to encode the image count, and then after that come the build ID bytes themselves.

Finally, we encode the path string using the scheme below.

String Encoding

Image paths contain a good deal of redundancy; paths are therefore encoded using a prefix compression scheme. The basic idea here is that while generating or reading the data, we maintain a mapping from small integers to path prefix segments.

The mapping is initialised with the following fixed list that never need to be stored in CIF data:

code	Path prefix
0	`/lib`
1	`/usr/lib`
2	`/usr/local/lib`
3	`/opt/lib`
4	`/System/Library/Frameworks`
5	`/System/Library/PrivateFrameworks`
6	`/System/iOSSupport`
7	`/Library/Frameworks`
8	`/System/Applications`
9	`/Applications`
10	`C:\Windows\System32`
11	`C:\Program Files`

Codes below 32 are reserved for future expansion of the fixed list.

Strings are encoded as a sequence of bytes, as follows:

`opcode`	Mnemonic	Meaning
`00000000`	`end`	Marks the end of the string
`00xxxxxx`	`str`	Raw string data
`01xxxxxx`	`framewk`	Names a framework
`1exxxxxx`	`expand`	Identifies a prefix in the table

`end`

Encoding

   7   6   5   4   3   2   1   0
 ┌───────────────────────────────┐
 │ 0   0   0   0   0   0   0   0 │  end
 └───────────────────────────────┘

Meaning

Marks the end of the string

`str`

Encoding

   7   6   5   4   3   2   1   0
 ┌───────┬───────────────────────┐
 │ 0   0 │ count                 │  str
 └───────┴───────────────────────┘

Meaning

The next count bytes are included in the string verbatim. Additionally, all path prefixes of this string data will be added to the current prefix table. For instance, if the string data is /swift/linux/x86_64/libfoo.so, then the prefix /swift will be assigned the next available code, /swift/linux the code after that, and /swift/linux/x86_64 the code following that one.

`framewk`

Encoding

   7   6   5   4   3   2   1   0
 ┌───────┬───────────────────────┐
 │ 0   1 │ count                 │  framewk
 └───────┴───────────────────────┘

Meaning

The next byte is a version character (normally A, but some frameworks use higher characters), after which there are count + 1 bytes of name.

This is expanded using the pattern /<name>.framework/Versions/<version>/<name>. This also marks the end of the string.

`expand`

Encoding

   7   6   5   4   3   2   1   0
 ┌───┬───┬───────────────────────┐
 │ 1 │ e │ code                  │  expand
 └───┴───┴───────────────────────┘

Meaning

If e is 0, code is the index into the prefix table for the prefix that should be appended to the string at this point.

If e is 1, this opcode is followed by code + 1 bytes that give a value v such that v + 64 is the index into the prefix table for the prefix that should be appended to the string at this point.

Example

Let's say we wish to encode the following strings:

/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
/System/Library/Frameworks/Photos.framework/Versions/A/Photos
/usr/lib/libobjc.A.dylib
/usr/lib/libz.1.dylib
/usr/lib/swift/libswiftCore.dylib
/usr/lib/libSystem.B.dylib
/usr/lib/libc++.1.dylib

We would encode

<84> <45> CAppKit <00>

We then follow with

<84> <45> APhotos <00>

Next we have

<81> <10> /libobjc.A.dylib <00>
<81> <0d> /libz.1.dylib <00>
<81> <19> /swift/libswiftCore.dylib <00>

assigning code 32 to /swift, then

<81> <12> /libSystem.B.dylib <00>
<81> <0f> /libc++.1.dylib <00>

In total the original data would have taken up 256 bytes. Instead, we have used 122 bytes, a saving of over 50%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CompactImageMapFormat.md

CompactImageMapFormat.md

Compact ImageMap Format

General Format

String Encoding

`end`

Encoding

Meaning

`str`

Encoding

Meaning

`framewk`

Encoding

Meaning

`expand`

Encoding

Meaning

Example

Files

CompactImageMapFormat.md

Latest commit

History

CompactImageMapFormat.md

File metadata and controls

Compact ImageMap Format

General Format

String Encoding

end

Encoding

Meaning

str

Encoding

Meaning

framewk

Encoding

Meaning

expand

Encoding

Meaning

Example

`end`

`str`

`framewk`

`expand`