hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Proposal: Make Rows and Columns byte arrays rather than Text (HBASE-82)
Date Sat, 26 Apr 2008 18:07:29 GMT
Jim Kellerman wrote:
>> -----Original Message-----
>> From: stack [mailto:stack@duboce.net]
>> + If comparator needs to do more than byte compare, then needs to
>> instantiate two classes for every compare (Not the case for
>> UTF-8 IIRC).
> Text instantiates an inner class for its comparator.
I don't follow?  Text does a one time registration with 
WritableComparable of an inner static class on class loading.  Above I 
was talking about fact that the signature in RawComparator is:

  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);

This is what we'd call comparing keys.  If your compare was other than 
byte array compares, then you'd have to instantiate objects from the 
passed in byte arrays per compare -- expensive -- but I think we can get 
away with just comparing bytes when the keys are/were Text (have to 
check -- looks like first byte is length).

But thinking on it, we can not support a comparator-per-table.  All 
tables are kept in the .META. catalog table.  There'd be havoc or worse, 
subtle bugs, if the regions inserted were meant to respect an order 
other than that of .META.

>> + Massive migration headache -- rewriting of every file (Need
>> versioning
>> of files stored in hbase).
Thinking on this one more, if straight byte-compare will work for keys 
that were inserted as Text, then things should work without need for a 
new migration step.

>> Issues:
>> + How do we add new comparators to CLASSPATH on a running cluster?  We
>> have this problem regards filters also so should come up with
>> a general solution (As Kevin Beyer has noted).  Excepting
>> restart -- not a soln.
> Not sure this is necessary if we store the comparator with the schema and since the schema
is slated to be removed from HRegionInfo and stored in a 'well known' location, the comparator
can be read in with the schema, a once only operation per table.
Even if one-time only, still need a one-time only classloader that can 
read from wherever the 'well-known' location is.

But I think regards comparators -- filters are another story -- we don't 
have to worry about it since we can't run tables w/ comparators that 
don't agree.

>> + Regards column names as byte arrays, in the Bigtable paper, it says
>> column family names need to be 'printable'.  We should have
>> same requirement.  Presume the column family preface is UTF-8
>> encoded.  It will make it so we can find the family/qualifier
>> ':' delimiter in the column name byte array.
> Do we need byte[] qualifiers? Perhaps Kevin can chime in here.
We could do that but I think it'd just look ugly in client API having 
columns be two-part.  In the interface between client and server, could 
have column naming be two-part but even here might be just more pain.

>> + Would further customize RPC so two types of invocation: data or
>> message.  There would be 'RegionServer RPC Server' that was
>> hardwired into regionserver going direct to regionserver
>> methods rather than via reflection.
>> + While get and batch update can be made to use byte arrays, scanners
>> are a little awkward.  Scanner setup would be message-type
>> RPC call but the next'ing calls against the scanner would be
>> data-type RPC calls.
> I think changing the RPC should be a separate issue. This will be a big enough change
as it is.

HBASE-82 is just about making keys and columns byte arrays rather than Text.


View raw message