arrow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [01/17] arrow git commit: ARROW-3: This patch includes a WIP draft specification document for the physical Arrow memory layout produced over a series of discussions amongst the to-be Arrow committers during late 2015. There are also a few small PNG diagr
Date Wed, 17 Feb 2016 12:39:36 GMT
Repository: arrow
Updated Branches:
  refs/heads/master d5aa7c466 -> 23c4b08d1

ARROW-3: This patch includes a WIP draft specification document for the physical Arrow memory
layout produced over a series of discussions amongst the to-be Arrow committers during late
2015. There are also a few small PNG diagrams that illustrate some of the Arrow layout concepts.


Branch: refs/heads/master
Commit: 16e44e3d456219c48595142d0a6814c9c950d30c
Parents: fa5f029
Author: Wes McKinney <>
Authored: Tue Feb 16 16:02:46 2016 -0800
Committer: Jacques Nadeau <>
Committed: Wed Feb 17 04:38:39 2016 -0800

 format/                           | 253 ++++++++++++++++++++++++
 format/                           |   5 +
 format/diagrams/layout-dense-union.png     | Bin 0 -> 47999 bytes
 format/diagrams/layout-list-of-list.png    | Bin 0 -> 40105 bytes
 format/diagrams/layout-list-of-struct.png  | Bin 0 -> 60600 bytes
 format/diagrams/layout-list.png            | Bin 0 -> 15906 bytes
 format/diagrams/layout-primitive-array.png | Bin 0 -> 10907 bytes
 format/diagrams/layout-sparse-union.png    | Bin 0 -> 43020 bytes
 8 files changed, 258 insertions(+)
diff --git a/format/ b/format/
new file mode 100644
index 0000000..c393163
--- /dev/null
+++ b/format/
@@ -0,0 +1,253 @@
+# Arrow: Physical memory layout
+## Definitions / Terminology
+Since different projects have used differents words to describe various
+concepts, here is a small glossary to help disambiguate.
+* Array: a sequence of values with known length all having the same type.
+* Slot or array slot: a single logical value in an array of some particular data type
+* Contiguous memory region: a sequential virtual address space with a given
+  length. Any byte can be reached via a single pointer offset less than the
+  region’s length.
+* Primitive type: a data type that occupies a fixed-size memory slot specified
+  in bit width or byte width
+* Nested or parametric type: a data type whose full structure depends on one or
+  more other child relative types. Two fully-specified nested types are equal
+  if and only if their child types are equal. For example, `List<U>` is distinct
+  from `List<V>` iff U and V are different relative types.
+* Relative type or simply type (unqualified): either a specific primitive type
+  or a fully-specified nested type. When we say slot we mean a relative type
+  value, not necessarily any physical storage region.
+* Logical type: A data type that is implemented using some relative (physical)
+  type. For example, a Decimal value stored in 16 bytes could be stored in a
+  primitive array with slot size 16 bytes. Similarly, strings can be stored as
+  `List<1-byte>`.
+* Parent and child arrays: names to express relationships between physical
+  value arrays in a nested type structure. For example, a `List<T>`-type parent
+  array has a T-type array as its child (see more on lists below).
+* Leaf node or leaf: A primitive value array that may or may not be a child
+  array of some array with a nested type.
+## Requirements, goals, and non-goals
+Base requirements
+* A physical memory layout enabling zero-deserialization data interchange
+  amongst a variety of systems handling flat and nested columnar data, including
+  such systems as Spark, Drill, Impala, Kudu, Ibis, Spark, ODBC protocols, and
+  proprietary systems that utilize the open source components.
+* All array slots are accessible in constant time, with complexity growing
+  linearly in the nesting level
+* Capable of representing fully-materialized and decoded / decompressed Parquet
+  data
+* All leaf nodes (primitive value arrays) use contiguous memory regions
+* Each relative type can be nullable or non-nullable
+* Arrays are immutable once created. Implementations can provide APIs to mutate
+  an array, but applying mutations will require a new array data structure to
+  be built.
+* Arrays are relocatable (e.g. for RPC/transient storage) without pointer
+  swizzling. Another way of putting this is that contiguous memory regions can
+  be migrated to a different address space (e.g. via a memcpy-type of
+  operation) without altering their contents.
+## Goals (for this document)
+* To describe relative types (physical value types and a preliminary set of
+  nested types) sufficient for an unambiguous implementation
+* Memory layout and random access patterns for each relative type
+* Null representation for nullable types
+## Non-goals (for this document
+* To enumerate or specify logical types that can be implemented as primitive
+  (fixed-width) value types. For example: signed and unsigned integers,
+  floating point numbers, boolean, exact decimals, date and time types,
+  CHAR(K), VARCHAR(K), etc.
+* To specify standardized metadata or a data layout for RPC or transient file
+  storage.
+* To define a selection or masking vector construct
+* Implementation-specific details
+* Details of a user or developer C/C++/Java API.
+* Any “table” structure composed of named arrays each having their own type or
+  any other structure that composes arrays.
+* Any memory management or reference counting subsystem
+* To enumerate or specify types of encodings or compression support
+## Array lengths
+Any array has a known and fixed length, stored as a 32-bit signed integer, so a
+maximum of 2^31 - 1 elements. We choose a signed int32 for a couple reasons:
+* Enhance compatibility with Java and client languages which may have varying quality of
support for unsigned integers.
+* To encourage developers to compose smaller arrays (each of which contains
+  contiguous memory in its leaf nodes) to create larger array structures
+  possibly exceeding 2^31 - 1 elements, as opposed to allocating very large
+  contiguous memory blocks.
+## Nullable and non-nullable arrays
+Any relative type can be nullable or non-nullable.
+Nullable arrays have a contiguous memory buffer, known as the null bitmask,
+whose length is large enough to have 1 bit for each array slot. Whether any
+array slot is null is encoded in the respective bits of this bitmask, i.e.:
+is_null[j] -> bitmask[j / 8] & (1 << (j % 8))
+Physically, non-nullable (NN) arrays do not have a null bitmask.
+For nested types, if the top-level nested type is nullable, it has its own
+bitmask regardless of whether the child types are nullable.
+## Primitive value arrays
+A primitive value array represents a fixed-length array of values each having
+the same physical slot width typically measured in bytes, though the spec also
+provides for bit-packed types (e.g. boolean values encoded in bits).
+Internally, the array contains a contiguous memory buffer whose total size is
+equal to the slot width multiplied by the array length. For bit-packed types,
+the size is rounded up to the nearest byte.
+The associated null bitmask (for nullable types) is contiguously allocated (as
+described above) but does not need to be adjacent in memory to the values
+(diagram not to scale)
+<img src="diagrams/layout-primitive-array.png" width="400"/>
+## List type
+List is a nested type in which each array slot contains a variable-size
+sequence of values all having the same relative type (heterogeneity can be
+achieved through unions, described later).
+A list type is specified like `List<T>`, where `T` is any relative type
+(primitive or nested).
+A list-array is represented by the combination of the following:
+* A values array, a child array of type T. T may also be a nested type.
+* An offsets array containing 32-bit signed integers with length equal to the
+  length of the top-level array plus one. Note that this limits the size of the
+  values array to 2^31 -1.
+The offsets array encodes a start position in the values array, and the length
+of the value in each slot is computed using the first difference with the next
+element in the offsets array. For example. the position and length of slot j is
+computed as:
+slot_position = offsets[j]
+slot_length = offsets[j + 1] - offsets[j]  // (for 0 <= j < length)
+The first value in the offsets array is 0, and the last element is the length
+of the values array.
+Let’s consider an example, the type `List<Char>`, where Char is a 1-byte
+logical type.
+For an array of length 3 with respective values:
+[[‘j’, ‘o’, ‘e’], null, [‘m’, ‘a’, ‘r’, ‘k’]]
+We have the following offsets and values arrays
+<img src="diagrams/layout-list.png" width="400"/>
+Let’s consider an array of a nested type, `List<List<byte>>`
+<img src="diagrams/layout-list-of-list.png" width="400"/>
+## Struct type
+A struct is a nested type parameterized by an ordered sequence of relative
+types (which can all be distinct), called its fields.
+Typically the fields have names, but the names and their types are part of the
+type metadata, not the physical memory layout.
+A struct does not have any additional allocated physical storage.
+Physically, a struct type has one child array for each field.
+For example, the struct (field names shown here as strings for illustration
+Struct [nullable] <
+  name: String (= List<char>) [nullable],
+  age: Int32 [not-nullable]
+has two child arrays, one List<char> array (layout as above) and one
+non-nullable 4-byte physical value array having Int32 (not-null) logical
+type. Here is a diagram showing the full physical layout of this struct:
+<img src="diagrams/layout-list-of-struct.png" width="400"/>
+While a struct does not have physical storage for each of its semantic slots
+(i.e. each scalar C-like struct), an entire struct slot can be set to null via
+the bitmask. Whether each of the child field arrays can have null values
+depends on whether or not the respective relative type is nullable.
+## Dense union type
+A dense union is semantically similar to a struct, and contains an ordered
+sequence of relative types. While a struct contains multiple arrays, a union is
+semantically a single array in which each slot can have a different type.
+The union types may be named, but like structs this will be a matter of the
+metadata and will not affect the physical memory layout.
+We define two distinct union types that are optimized for different use
+cases. This first, the dense union, represents a mixed-type array with 6 bytes
+of overhead for each value. Its physical layout is as follows:
+* One child array for each relative type
+* Types array: An array of unsigned integers, enumerated from 0 corresponding
+  to each type, with the smallest byte width capable of representing the number
+  of types in the union.
+* Offsets array: An array of signed int32 values indicating the relative offset
+  into the respective child array for the type in a given slot. The respective
+  offsets for each child value array must be in order / increasing.
+Alternate proposal (TBD): the types and offset values may be packed into an
+int48 with 2 bytes for the type and 4 bytes for the offset.
+Critically, the dense union allows for minimal overhead in the ubiquitous
+union-of-structs with non-overlapping-fields use case (Union<s1: Struct1, s2:
+Struct2, s3: Struct3, …>)
+Here is a diagram of an example dense union:
+<img src="diagrams/layout-dense-union.png" width="400"/>
+## Sparse union type
+A sparse union has the same structure as a dense union, with the omission of
+the offsets array. In this case, the child arrays are each equal in length to
+the length of the union. This is analogous to a large struct in which all
+fields are nullable.
+While a sparse union may use significantly more space compared with a dense
+union, it has some advantages that may be desirable in certain use cases:
+<img src="diagrams/layout-sparse-union.png" width="400"/>
+More amenable to vectorized expression evaluation in some use cases.
+Equal-length arrays can be interpreted as a union by only defining the types array
+Note that nested types in a sparse union must be internally consistent
+(e.g. see the List in the diagram), i.e. random access at any index j yields
+the correct value.
+## References
+Drill docs
diff --git a/format/ b/format/
new file mode 100644
index 0000000..1120e62
--- /dev/null
+++ b/format/
@@ -0,0 +1,5 @@
+## Arrow specification documents
+> **Work-in-progress specification documents**. These are discussion documents
+> created by the Arrow developers during late 2015 and in no way represents a
+> finalized specification.
diff --git a/format/diagrams/layout-dense-union.png b/format/diagrams/layout-dense-union.png
new file mode 100644
index 0000000..5f1f381
Binary files /dev/null and b/format/diagrams/layout-dense-union.png differ
diff --git a/format/diagrams/layout-list-of-list.png b/format/diagrams/layout-list-of-list.png
new file mode 100644
index 0000000..5bc0078
Binary files /dev/null and b/format/diagrams/layout-list-of-list.png differ
diff --git a/format/diagrams/layout-list-of-struct.png b/format/diagrams/layout-list-of-struct.png
new file mode 100644
index 0000000..00d6c6f
Binary files /dev/null and b/format/diagrams/layout-list-of-struct.png differ
diff --git a/format/diagrams/layout-list.png b/format/diagrams/layout-list.png
new file mode 100644
index 0000000..167b10b
Binary files /dev/null and b/format/diagrams/layout-list.png differ
diff --git a/format/diagrams/layout-primitive-array.png b/format/diagrams/layout-primitive-array.png
new file mode 100644
index 0000000..bd212f0
Binary files /dev/null and b/format/diagrams/layout-primitive-array.png differ
diff --git a/format/diagrams/layout-sparse-union.png b/format/diagrams/layout-sparse-union.png
new file mode 100644
index 0000000..450ea29
Binary files /dev/null and b/format/diagrams/layout-sparse-union.png differ

View raw message