Skip to content

Latest commit

 

History

History
42 lines (30 loc) · 979 Bytes

encoding.md

File metadata and controls

42 lines (30 loc) · 979 Bytes
CAMLprim value caml_ml_string_length(value s)
{
  mlsize_t temp;
  temp = Bosize_val(s) - 1;
  Assert (Byte (s, temp - Byte (s, temp)) == 0);
  return Val_long(temp - Byte (s, temp));
}

Like all heap blocks, strings contain a header defining the size of the string in machine words. The actual block contents are:

  • the characters of the string
  • padding bytes to align the block on a word boundary.
    The padding is one of 00 00 01 00 00 02 00 00 00 03 on a 32-bit machine, and up to 00 00 .... 07 on a 64-bit machine.

Thus, the string is always zero-terminated, and its length can be computed as follows:

number_of_words_in_block * sizeof(word) - last_byte_of_block - 1

The null-termination comes handy when passing a string to C, but is not relied upon to compute the length (in Caml), allowing the string to contain nulls.

so, suppose

"" -> 8 - 7 - 1 "a" -> 8 - 6 - 1 "0123456" -> 8 - 0 - 1 "01234567" -> 2 * 8 - 7 - 1