hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Proposal: Make Rows and Columns byte arrays rather than Text (HBASE-82)
Date Fri, 25 Apr 2008 18:54:20 GMT
Below are some notes on HBASE-82 and issues we'll have to deal with 
implementing.  There's been back and forth in the issue but its a big 
change so I thought I'd put a note up here before digging in, in case 
there are objections or suggestions that others might have.

Text Rows are 'bad' because, borrowing heavily from Kevin Beyer 
arguments made in HBASE-82:

+ You are out of luck if you want a sort other than UTF-8: e.g. ignore 
case or accents, or you want language-specific sort
+ JAQL, for example, wants to be able to use any 'type' as a key: "Jaql 
supports an extended JSON data model...(almost) any JSON value can be 
used as a map/reduce key, join key, group-by key, or sort key. For 
example, a (string,int)-pair like ["astring",17] can be a key."
+ If we want to make a lean and mean data, as opposed to 'message', 
transfer channel, we need rows (and columns) to be byte arrays.

Here is a bit more on the latter point:

HBase uses the one RPC mechanism for messaging between clients and 
servers -- "Where's the -ROOT-?", "I'm here still", and "I opened 
successfully that region you asked me to open!" -- and for data transfer 
("Here is the 10Millionth 4k cell that I've given you in the last five 
minutes").  Our RPC is flexible: you can easily change method signatures 
and add new method invocations because it uses reflection figuring what 
and where to run the invocation.  Parameters are java objects that are 
serialized crossing the RPC divide.   This flexibility comes at some 
considerable cost.  While a flexible RPC makes sense for message passing 
-- their rare in the scheme of things and objects carried can be a 
little involved  -- it does not make sense transferring dumb data 
particularly when the need is to be able to do it tens of thousands of 
times a second and what we're passing is basically just byte arrays.  At 
least, they should be just byte arrays.  Currently, in hbase, while cell 
values are byte arrays, their coordinates are not: row and column are 
Text objects.  Also, we currently package up results into Java objects; 
Cells, Maps, and RowResults.   HBASE-82 would make it possible to 
instead pass simple arrays of arrays data structures making data transfers.

HBASE-82 would:

+ Change rows and columns to be byte arrays rather than Text
+ Require administrators to supply an instance of 
org.apache.hadoop.io.RawComparator  -- a byte array comparator -- to use 
comparing row keys on table creation.


+ Byte arrays are the lowest common denominator.  Any Writable can be a 
key for instance.
+ It should be possible to make servers' should run faster since data 
transfer could be made incurr less overhead


+ If comparator needs to do more than byte compare, then needs to 
instantiate two classes for every compare (Not the case for UTF-8 IIRC).
+ Massive migration headache -- rewriting of every file (Need versioning 
of files stored in hbase).


+ How do we add new comparators to CLASSPATH on a running cluster?  We 
have this problem regards filters also so should come up with a general 
solution (As Kevin Beyer has noted).  Excepting restart -- not a soln. 
for REAL clusters --  we need a means of dynamically adding to the 
CLASSPATH of a running hbase instance.  In MR, its easier since tasks 
are each a new JVM invocation.  We need an HTTP/URL classloader or a 
classloader that watches a folder in HDFS looking for additions or 
removals loading whatever is in it.  Would suggest this feature doesn't 
have to be dealt with for HBASE-82.
+ Regards column names as byte arrays, in the Bigtable paper, it says 
column family names need to be 'printable'.  We should have same 
requirement.  Presume the column family preface is UTF-8 encoded.  It 
will make it so we can find the family/qualifier ':' delimiter in the 
column name byte array.
+ Client API could be mostly the same; just that where now we have rows 
and columns of Text, these would call through to the versions that take 
byte [].   Client would still return Cell and Maps of column to Cells 
but the client would make these up for the user out of arrays of byte 
arrays passed back from the server (API could also reveal raw arrays of 
arrays calls too).
+ Would further customize RPC so two types of invocation: data or 
message.  There would be 'RegionServer RPC Server' that was hardwired 
into regionserver going direct to regionserver methods rather than via 
+ While get and batch update can be made to use byte arrays, scanners 
are a little awkward.  Scanner setup would be message-type RPC call but 
the next'ing calls against the scanner would be data-type RPC calls.


View raw message