Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 17 Mar 2014 21:00:45 +0000 (UTC)
From: "stack (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12701619.1394835936838.92392.1395090045078@arcas>
In-Reply-To: <JIRA.12701619.1394835936838@arcas>
References: <JIRA.12701619.1394835936838@arcas>
Subject: [jira] [Commented] (HBASE-10756) Adding Data Types and Structured
 Row Keys in 0.89-fb HBase
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938376#comment-13938376 ] 

stack commented on HBASE-10756:
-------------------------------

[~manukranthk] Sweet.

Here is short version.  If you want more, just say; we can do a phone call and I'll catch you up.

In our little hbase ecosystem, there are as many type systems and type serializations as there are tools on top.  Kiji, Kite, Phoenix (and others such as Splice Machine) have all come up w/ their own way of serializing types into HBase and then beyond this of serializing in a manner that preserves order when values are used as row key parts: e.g. flipping sign bit so negative numbers sort behind positive numbers, etc.  Kiji and Kite depend on Avro type serializations with customizations.  Phoenix has its own system.

Each has its own way of specifying the key format, usually as a serialized data structure specified variously (as 'special columns' in phoenix or via avro customizations in kite).

Chatting offline, the thought was that getting all these systems interacting, we need to agree on the first step, a serialization format (later we can come along and agree on how to spec rowkeys, schema evolution...).  So, can we agree on how to serialize and int, sql types, and complex types into a cell?

In an effort at a serialization esperanto, [~ndimiduk] built the OrderedTypes and the content of the types package in hbase original toward the end of last year as a system that Hive might move to (this project is on hold apparently at the moment).  Could this effort be the common format we all use?

Phoenix has said already that it will move to Nicks' system.  The Kite folks are looking at it to see if it could serve as the serialization basis for kite.  Would it work for Presto [~manukranthk]?

It is an amalgam of the Orderly project, phoenix serialzations, and sqllite.  See here for more launching the project http://search-hadoop.com/m/JfPZzujFjZ  and here for an overview: https://issues.apache.org/jira/secure/attachment/12589798/hbase%20data%20types%20WIP.pdf (all from HBASE-8089). 

Good on you.

> Adding Data Types and Structured Row Keys in 0.89-fb HBase
> ----------------------------------------------------------
>
>                 Key: HBASE-10756
>                 URL: https://issues.apache.org/jira/browse/HBASE-10756
>             Project: HBase
>          Issue Type: New Feature
>          Components: Usability
>    Affects Versions: 0.89-fb
>            Reporter: Manukranth Kolloju
>            Assignee: Manukranth Kolloju
>             Fix For: 0.89-fb
>
>
> As an extension to some of the work done on Presto + HBase side, and also inspired by some of the work done on open source and Pheonix, introducing data types and structured row keys will enable the data base(hbase) to de-couple database level optimizations from the application level schema. The attempt is to provide a table definition & specification to define the row key structure which can be composed as a composite struct composed of primitive data types.
> The data base can make intelligent decisions of how to interpret the data. For instance, having an understanding of the the structure of row key will hint the database about the parts of the data that are valuable and can use that information to construct indexes/bloom filters based on these parts of the row key.
> This can be extended to the column qualifiers and Nested Types as well.


--
This message was sent by Atlassian JIRA
(v6.2#6252)