hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
Date Sun, 03 Feb 2008 21:37:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565228#action_12565228
] 

Bryan Duxbury commented on HADOOP-2334:
---------------------------------------

I think there are a number of reasons why this would end up being more trouble than it's worth.
If we templatized HTable, we'd have to make sure that the class you templatized with would
be available to the Master and RegionServers as well as the client. This means you'd essentially
need to make a custom build of HBase to use the templatized version of HTable. 

Also, the additional typecheck at every put/get/etc. would add a bunch of overhead. HBase
isn't exactly the fastest product to start with, so this additional hit might be pretty negative.


To top it all off, I'm not convinced that it's really that useful. What kind of things do
you need as keys that can't be serialized into Texts?

> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-2334
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2334
>             Project: Hadoop Core
>          Issue Type: Wish
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Priority: Minor
>
> I have heard from several people that row keys in HBase should be less restricted than
hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead to the most
general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The
primary difference between these two classes is that hadoop.io.BytesWritable by default allocates
100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting
a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass
in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable,
because it has a fixed size once set, and operations like get, etc do not have to something
like System.arrayCopy where you specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough feedback that
Text is too restrictive, we are willing to change it, but we need to hear what would be the
most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message