Repository: parquet-format
Updated Branches:
refs/heads/master 863875e0b -> ddc18a7af
PARQUET-1125: Add UUID logical type.
UUIDs are commonly used as unique identifiers. A binary representation will reduce memory
when writing or building bloom filters and will reduce cycles needed to compare values.
This commit is based on PARQUET-906 / PR #51.
Author: Ryan Blue <blue@apache.org>
Closes #71 from rdblue/PARQUET-1125-add-uuid-logical-type and squashes the following commits:
dc01707 [Ryan Blue] PARQUET-1125: Add UUID logical type.
Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/ddc18a7a
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/ddc18a7a
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/ddc18a7a
Branch: refs/heads/master
Commit: ddc18a7af21127f9100096b5b356d1cad888d174
Parents: 863875e
Author: Ryan Blue <blue@apache.org>
Authored: Tue Oct 10 12:53:19 2017 -0700
Committer: Ryan Blue <blue@apache.org>
Committed: Tue Oct 10 12:53:19 2017 -0700
----------------------------------------------------------------------
LogicalTypes.md | 13 ++++++++++++-
src/main/thrift/parquet.thrift | 1 +
2 files changed, 13 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/parquet-format/blob/ddc18a7a/LogicalTypes.md
----------------------------------------------------------------------
diff --git a/LogicalTypes.md b/LogicalTypes.md
index c50b96b..2c80256 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -48,7 +48,18 @@ was converted from an enumerated type in another data model (e.g. Thrift,
Avro,
Applications using a data model lacking a native enum type should interpret `ENUM`
annotated field as a UTF-8 encoded string.
-The sort order used for `ENUM`s is `UNSIGNED` byte-wise comparison.
+The sort order used for `ENUM` values is unsigned byte-wise comparison.
+
+### UUID
+
+`UUID` annotates a 16-byte fixed-length binary. The value is encoded using
+big-endian, so that `00112233-4455-6677-8899-aabbccddeeff` is encoded as the
+bytes `00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff`
+(This example is from [wikipedia's UUID page][wiki-uuid]).
+
+The sort order used for `UUID` values is unsigned byte-wise comparison.
+
+[wiki-uuid]: https://en.wikipedia.org/wiki/Universally_unique_identifier
## Numeric Types
http://git-wip-us.apache.org/repos/asf/parquet-format/blob/ddc18a7a/src/main/thrift/parquet.thrift
----------------------------------------------------------------------
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 4c76cbd..a4e193e 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -226,6 +226,7 @@ struct Statistics {
/** Empty structs to use as logical type annotations */
struct StringType {} // allowed for BINARY, must be encoded with UTF-8
+struct UUIDType {} // allowed for FIXED[16], must encoded raw UUID bytes
struct MapType {} // see LogicalTypes.md
struct ListType {} // see LogicalTypes.md
struct EnumType {} // allowed for BINARY, must be encoded with UTF-8
|