hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-82) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
Date Fri, 15 Feb 2008 12:34:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569252#action_12569252

Jim Kellerman commented on HBASE-82:

I've been doing some thinking about this and would like to address some arguments that have
been made both for and against this change:

With respect to custom WritableComparables and Comparators;
- It would not be too hard to get custom code to the servers. It just means that the custom
code has to be distributed to every node running HBase and the classpath has to be adjusted.
This is also a requirement for using custom classes in Map/Reduce, so people are already familiar
with doing this.
- However, having said that, I do agree that serialization and deserialization of custom classes
is much more expensive than for a restricted set of "known" classes.

With respect to using byte arrays as keys:
- This is appealing to me. Serialization and deserialization are easy and fast as is comparison.
It would be easy to extend ImmutableBytesWritable to implement Comparable.
- BerkeleyDB uses byte arrays as keys and values and allows either to be up to 4GB (unsigned
int) in length. We could live with the 2GB limit that Java's signed int imposes.

The biggest downside to changing keys from Text to anything else is migration. However:
- We do have a migration framework in place.
- If we are going to make this change it would be better to do it sooner rather than later
before users have hundreds of GB of data stored in HBase.
- It is easy to convert between Text and byte
    bytes[] b = something
    Text t = new Text(b)
    bytes[] b = t.getBytes();

> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>                 Key: HBASE-82
>                 URL: https://issues.apache.org/jira/browse/HBASE-82
>             Project: Hadoop HBase
>          Issue Type: Wish
>            Reporter: Jim Kellerman
>            Priority: Minor
> I have heard from several people that row keys in HBase should be less restricted than
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead to the most
general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The
primary difference between these two classes is that hadoop.io.BytesWritable by default allocates
100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting
a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass
in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable,
because it has a fixed size once set, and operations like get, etc do not have to something
like System.arrayCopy where you specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough feedback that
Text is too restrictive, we are willing to change it, but we need to hear what would be the
most useful thing to change it to as well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message