hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-82) row keys should be array of bytes with a specified comparator
Date Thu, 15 May 2008 21:34:55 GMT

    [ https://issues.apache.org/jira/browse/HBASE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597277#action_12597277

stack commented on HBASE-82:

Thanks for the review.

On DELIMITER, regards HRI, we only parse the name in one place figuring the table name from
region name.  Looking at how we compose region names, the parse is safe since it ranges over
the table name portion which has been checked for legal characters.

For HSK, parses start at the buffers's zeroth byte.  Passed buffers will be full column name
or family name.  Family names are also checked to make sure they have legal characters --
i.e. non-DELIMITER characters.

As best as I can tell, things are no worse than they were previously.

On legal table name, I fixed the javadoc (and made it more clear that this is for user-space
tables, not catalog tables).

> HConnectionManager, RegExpRowFilter, HbaseMapWritable: why change from HashMap to SortedMap?
Is compareTo cheaper than computing hash? Not if map key is the computed hash of the byte

On why SortedMaps rather than HashMaps, its because you can't use byte arrays as keys in HashMaps;
or, rather, the hashcode for a byte array is that of the Object so two different byte arrays
will not equate, even though they might have same content (but I think you know this, or at
least, if you didn't, you do after reviewing this patch).  I thought that I'd gotten all places
where I could get away with keeping HashMaps and using hash of byte array key as HashMap key
but I missed the one in HConnectionManager.  I'll changed it back to HashMap with hash for
of byte arrays for key (profiling yesterday, the lookup against this map of regions looking
for cached regions on clientside takes 15% of all CPU doing writes -- most of time is in the
SoftSortedMap -- this change should help a little).  RegExpRowFilter needs keys to be column
names as does HbaseMapWritable (users of the latter want to get Map.Entry, etc. They'd be
baffled if instead of the key they entered, instead they got an integer.

On Bytes.mapKey instead of Arrays.hashCode, the former does the boxing to Integer and uses
the hashing mechanism that was in place before this patch -- the one Text used.  Any reason
for using Arrays.hashcode instead?

On Maps converted to Sets instead, i got a few of these already.  If you noticed any I missed,
I'd be interested.  I was certainly looking for opportunity (though, IIRC, Sets are often
implemented atop Maps anyways so savings, if any, are minimal).

On changing regionServerStartup, I needed to pass Configuration; keys of WritableComparable.
 Rather than do convertions on both sides, it was simplier to just use MapWritable for this
one time operation.

There is another issue for managing any migrations associated with this issue.  This patch
is big enough as it is.  Migration work will be easier with it committed.

Thanks for the pointer to Byte.SIZE.  I changed Bytes.SIZEOF_LONG to do as you suggest.

May I commit?

> row keys should be array of bytes with a specified comparator
> -------------------------------------------------------------
>                 Key: HBASE-82
>                 URL: https://issues.apache.org/jira/browse/HBASE-82
>             Project: Hadoop HBase
>          Issue Type: Wish
>            Reporter: Jim Kellerman
>            Assignee: stack
>             Fix For: 0.2.0
>         Attachments: 82-v12-ignore-ws.patch, 82-v13-ignore-ws.patch, 82-v2.patch, 82-v3.patch,
82-v4.patch, 82-v5.patch, 82-v7.patch, 82-v8.patch, 82-v9-ignore-ws.patch, 82.patch, Perf.java
> I have heard from several people that row keys in HBase should be less restricted than
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead to the most
general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The
primary difference between these two classes is that hadoop.io.BytesWritable by default allocates
100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting
a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass
in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable,
because it has a fixed size once set, and operations like get, etc do not have to something
like System.arrayCopy where you specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough feedback that
Text is too restrictive, we are willing to change it, but we need to hear what would be the
most useful thing to change it to as well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message