hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dalton <mwdal...@gmail.com>
Subject just open sourced Orderly -- a row key schema system (composite keys, etc) for use with HBase
Date Thu, 14 Apr 2011 00:55:02 GMT
Hi all,

I'm with a startup, GotoMetrics, doing things with Hadoop  and I've gotten
permission to open source Orderly -- our row key schema system for use in
projects like HBase. Orderly allows you to serialize common data types
(long, double, bigdecimal, etc) or structs/records of these types to byte
arrays, and ensures that the byte arrays sort in the same natural order as
the data type. You may then use the byte arrays as keys in HBase (or any
sorted, byte-typed key-value store).

 I'd really appreciate feedback about what parts or useful (or not useful),
and if this would be something that would be appropriate to submit as a
contrib to HBase itself (or if people would prefer me to submit derivative
work to add composite row keys to Hive/Pig/etc).

Here are the interesting features:

   - All types are serialized a byte array that sorts in the natural order
   of the underlying key for all key values (e.g., an Integer row key will sort
   correctly for negative/positive values, a double will sort correctly for
   negative/positive/zero/infinity/negative infinity/subnormals/etc - any valid
   value)
   - Both ascending and descending sort order are supported for all types
   - Designed for space efficiency - tricks like using the end of a byte
   array instead of a terminator byte, variable-length types whenever possible,
   etc are all employed to minimize serialization length
   - Support for row key prefixes/suffixes to combine with your own custom
   encodings
   - Variable-length integers (similar in theory to Zig-Zag encoding) are
   supported, and their byte serialization preserves sort ordering
   - BigDecimal support (like all other types, with sort ordering-preserving
   byte serialization). To the best of my knowledge the first byte-sortable
   BigDecimal serialization.
   - Float/Double
   - UTF-8 strings (with support for empty string, NULL, etc)
   - Almost all types encode NULL, and do so without using additional space
   (e.g., by using transformation on invalid UTF-8 encodings for Strings, NaNs
   removed during NaN canonicalization for doubles, etc). Null comparess less
   than any non-null value
   - Support for struct (composite) row keys with an arbitrary number of
   fields. Each field may have its own sort order. Structs are sorted by field
   value.

I have the code up on github at  http://github.com/mwdalton/orderly. There
are javadocs for all the row key types explaining their serialization format
and performance characteristics (start with the RowKey and StructRowKey
docs), as well as example code in src/example.

Please let me know if you have any questions or if there's anything that
would be useful to add/change. Thanks!

Best regards,

Mike

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message