hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10756) Adding Data Types and Structured Row Keys in 0.89-fb HBase
Date Mon, 17 Mar 2014 21:00:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938376#comment-13938376

stack commented on HBASE-10756:

[~manukranthk] Sweet.

Here is short version.  If you want more, just say; we can do a phone call and I'll catch
you up.

In our little hbase ecosystem, there are as many type systems and type serializations as there
are tools on top.  Kiji, Kite, Phoenix (and others such as Splice Machine) have all come up
w/ their own way of serializing types into HBase and then beyond this of serializing in a
manner that preserves order when values are used as row key parts: e.g. flipping sign bit
so negative numbers sort behind positive numbers, etc.  Kiji and Kite depend on Avro type
serializations with customizations.  Phoenix has its own system.

Each has its own way of specifying the key format, usually as a serialized data structure
specified variously (as 'special columns' in phoenix or via avro customizations in kite).

Chatting offline, the thought was that getting all these systems interacting, we need to agree
on the first step, a serialization format (later we can come along and agree on how to spec
rowkeys, schema evolution...).  So, can we agree on how to serialize and int, sql types, and
complex types into a cell?

In an effort at a serialization esperanto, [~ndimiduk] built the OrderedTypes and the content
of the types package in hbase original toward the end of last year as a system that Hive might
move to (this project is on hold apparently at the moment).  Could this effort be the common
format we all use?

Phoenix has said already that it will move to Nicks' system.  The Kite folks are looking at
it to see if it could serve as the serialization basis for kite.  Would it work for Presto

It is an amalgam of the Orderly project, phoenix serialzations, and sqllite.  See here for
more launching the project http://search-hadoop.com/m/JfPZzujFjZ  and here for an overview:
https://issues.apache.org/jira/secure/attachment/12589798/hbase%20data%20types%20WIP.pdf (all
from HBASE-8089). 

Good on you.

> Adding Data Types and Structured Row Keys in 0.89-fb HBase
> ----------------------------------------------------------
>                 Key: HBASE-10756
>                 URL: https://issues.apache.org/jira/browse/HBASE-10756
>             Project: HBase
>          Issue Type: New Feature
>          Components: Usability
>    Affects Versions: 0.89-fb
>            Reporter: Manukranth Kolloju
>            Assignee: Manukranth Kolloju
>             Fix For: 0.89-fb
> As an extension to some of the work done on Presto + HBase side, and also inspired by
some of the work done on open source and Pheonix, introducing data types and structured row
keys will enable the data base(hbase) to de-couple database level optimizations from the application
level schema. The attempt is to provide a table definition & specification to define the
row key structure which can be composed as a composite struct composed of primitive data types.
> The data base can make intelligent decisions of how to interpret the data. For instance,
having an understanding of the the structure of row key will hint the database about the parts
of the data that are valuable and can use that information to construct indexes/bloom filters
based on these parts of the row key.
> This can be extended to the column qualifiers and Nested Types as well.

This message was sent by Atlassian JIRA

View raw message