hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8865) HBase shell split command acts incorrectly with hex split keys.
Date Fri, 12 Jul 2013 17:51:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707149#comment-13707149
] 

Nick Dimiduk commented on HBASE-8865:
-------------------------------------

After reading through this issue more carefully and also looking at HBASE-6643, I think users
of the shell would expect all commands that interact with byte[]'s to be processed through
{{Bytes.toBytesBinary}} and results printed using {{Bytes.toStringBinary}}. The trouble with
{{toBytesBinary}} is that it doesn't take the extra step of performing UTF-8 encoding on non-escaped
characters.

{code}
      } else {
        b[size++] = (byte) ch;
      }
{code}

That cast to {{byte}} from {{ch}} should instead be the equivalent of:

{noformat}
String.valueOf(ch).getBytes("UTF-8");
{noformat}

I think using this patch as is will break splits for split points containing non-escaped unicode
characters (ie, ΓΌ), because they're cast to a single {{byte}}.
                
> HBase shell split command acts incorrectly with hex split keys.
> ---------------------------------------------------------------
>
>                 Key: HBASE-8865
>                 URL: https://issues.apache.org/jira/browse/HBASE-8865
>             Project: HBase
>          Issue Type: Bug
>          Components: shell, Usability
>    Affects Versions: 0.94.8
>         Environment: Linux
>            Reporter: Ding Haifeng
>         Attachments: 8865.txt
>
>
> When I tried to do a manual region split from HBase shell, I found that split command
acts incorrectly with hex split keys. 
> Here is an example.
> I execute hbase(main):003:0> split 'tsdb', "\x00\x00\xC3" .
> While I expect it to split at the 3-byte key "\x00\x00\xC3" , it actually split at a
5-byte key "\x00\x00\xEF\xBF\xBD". 
> I test with more split keys and find some patterns:
> * If the all bytes in the split key represented in hexadecimal are between "\x00" and
"\x7F" , it works as expected and split at exactly the key specified.
> * If there are any bytes between "\x80" and "xFF", it works incorrectly. No matter the
byte is, it is interpreted as "\xEF\xBF\xBD". Here is another example. Specifying split key
"\x00\xA0\x00\xB0" actually splits at "\x00\xEF\xBF\xBD\x00\xEF\xBF\xBD".
> I'm running Hbase 0.94.8, r1485407, both server-side and client-side. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message