parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From b...@apache.org
Subject parquet-format git commit: PARQUET-1125: Add UUID logical type.
Date Tue, 10 Oct 2017 19:53:23 GMT
Repository: parquet-format
Updated Branches:
  refs/heads/master 863875e0b -> ddc18a7af


PARQUET-1125: Add UUID logical type.

UUIDs are commonly used as unique identifiers. A binary representation will reduce memory
when writing or building bloom filters and will reduce cycles needed to compare values.

This commit is based on PARQUET-906 / PR #51.

Author: Ryan Blue <blue@apache.org>

Closes #71 from rdblue/PARQUET-1125-add-uuid-logical-type and squashes the following commits:

dc01707 [Ryan Blue] PARQUET-1125: Add UUID logical type.


Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/ddc18a7a
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/ddc18a7a
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/ddc18a7a

Branch: refs/heads/master
Commit: ddc18a7af21127f9100096b5b356d1cad888d174
Parents: 863875e
Author: Ryan Blue <blue@apache.org>
Authored: Tue Oct 10 12:53:19 2017 -0700
Committer: Ryan Blue <blue@apache.org>
Committed: Tue Oct 10 12:53:19 2017 -0700

----------------------------------------------------------------------
 LogicalTypes.md                | 13 ++++++++++++-
 src/main/thrift/parquet.thrift |  1 +
 2 files changed, 13 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/parquet-format/blob/ddc18a7a/LogicalTypes.md
----------------------------------------------------------------------
diff --git a/LogicalTypes.md b/LogicalTypes.md
index c50b96b..2c80256 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -48,7 +48,18 @@ was converted from an enumerated type in another data model (e.g. Thrift,
Avro,
 Applications using a data model lacking a native enum type should interpret `ENUM`
 annotated field as a UTF-8 encoded string. 
 
-The sort order used for `ENUM`s is `UNSIGNED` byte-wise comparison.
+The sort order used for `ENUM` values is unsigned byte-wise comparison.
+
+### UUID
+
+`UUID` annotates a 16-byte fixed-length binary. The value is encoded using
+big-endian, so that `00112233-4455-6677-8899-aabbccddeeff` is encoded as the
+bytes `00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff`
+(This example is from [wikipedia's UUID page][wiki-uuid]).
+
+The sort order used for `UUID` values is unsigned byte-wise comparison.
+
+[wiki-uuid]: https://en.wikipedia.org/wiki/Universally_unique_identifier
 
 ## Numeric Types
 

http://git-wip-us.apache.org/repos/asf/parquet-format/blob/ddc18a7a/src/main/thrift/parquet.thrift
----------------------------------------------------------------------
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 4c76cbd..a4e193e 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -226,6 +226,7 @@ struct Statistics {
 
 /** Empty structs to use as logical type annotations */
 struct StringType {}  // allowed for BINARY, must be encoded with UTF-8
+struct UUIDType {}    // allowed for FIXED[16], must encoded raw UUID bytes
 struct MapType {}     // see LogicalTypes.md
 struct ListType {}    // see LogicalTypes.md
 struct EnumType {}    // allowed for BINARY, must be encoded with UTF-8


Mime
View raw message