hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aditya Kishore (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-6991) Escape "\" in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary()
Date Wed, 17 Oct 2012 05:14:03 GMT

     [ https://issues.apache.org/jira/browse/HBASE-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aditya Kishore updated HBASE-6991:
----------------------------------

    Fix Version/s: 0.96.0
     Hadoop Flags: Incompatible change
           Status: Patch Available  (was: Open)

The patch include the following changes:

1. Gets rid of unnecessary byte[] to String conversion. The "ISO-8859-1" charset does not
do any transformation anyway. This also does away with the need of try-catch block.
{code}
-    String first = new String(b, off, len, "ISO-8859-1");
-    for (int i = 0; i < first.length() ; ++i ) {
-      int ch = first.charAt(i) & 0xFF;

+    for (int i = off; i < off + len ; ++i ) {
+      int ch = b[i] & 0xFF;
{code}

2. Removed "\" from the set of printable non-alphanumeric characters so that it can be escaped
using the "\xXX" format.
{code}
-          || " `~!@#$%^&*()-_=+[]{}\\|;:'\",.<>/?".indexOf(ch) >= 0 ) {

+          || " `~!@#$%^&*()-_=+[]{}|;:'\",.<>/?".indexOf(ch) >= 0 ) {
{code}

3. Added new test case to verify that the conversion is reversible for random array of bytes.
Without this change the test always fails. The test add 1 extra second to the test run.

{code:title=hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestBytes.java}
+  public void testToStringBytesBinaryReversible() {
+    //  let's run test with 1000 randomly generated byte arrays
+    Random rand = new Random(System.currentTimeMillis());
+    byte[] randomBytes = new byte[1000];
+    for (int i = 0; i < 1000; i++) {
+      rand.nextBytes(randomBytes);
+      verifyReversibleForBytes(randomBytes);
+    }
+
+    //  some specific cases
+    verifyReversibleForBytes(new  byte[] {});
+    verifyReversibleForBytes(new  byte[] {'\\', 'x', 'A', 'D'});
+    verifyReversibleForBytes(new  byte[] {'\\', 'x', 'A', 'D', '\\'});
+  }
+
+  private void verifyReversibleForBytes(byte[] originalBytes) {
+    String convertedString = Bytes.toStringBinary(originalBytes);
+    byte[] convertedBytes = Bytes.toBytesBinary(convertedString);
+    if (Bytes.compareTo(originalBytes, convertedBytes) != 0) {
+      fail("Not reversible for\nbyte[]: " + Arrays.toString(originalBytes) +
+          ",\nStringBinary: " + convertedString);
+    }
+  }
{code}

4. And finally, fixes the two test cases which were breaking because they assumed that "\"
is encoded as "\".
{code}
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java

-            + "\\xD46\\xEA5\\xEA3\\xEA7\\xE7\\x00LI\\s\\xA0\\x0F\\x00\\x00"
+            + "\\xD46\\xEA5\\xEA3\\xEA7\\xE7\\x00LI\\x5Cs\\xA0\\x0F\\x00\\x00"
{code}

Setting the "Incompatible change" flag since any other code which makes the same assumption
as the two test cases needs fix.
                
> Escape "\" in Bytes.toStringBinary() and its counterpart Bytes.toBytesBinary()
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-6991
>                 URL: https://issues.apache.org/jira/browse/HBASE-6991
>             Project: HBase
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.96.0
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>             Fix For: 0.96.0
>
>         Attachments: HBASE-6991_trunk.patch
>
>
> Since "\" is used to escape non-printable character but not treated as special character
in conversion, it could lead to unexpected conversion.
> For example, please consider the following code snippet.
> {code}
> public void testConversion() {
>   byte[] original = {
>       '\\', 'x', 'A', 'D'
>   };
>   String stringFromBytes = Bytes.toStringBinary(original);
>   byte[] converted = Bytes.toBytesBinary(stringFromBytes);
>   System.out.println("Original: " + Arrays.toString(original));
>   System.out.println("Converted: " + Arrays.toString(converted));
>   System.out.println("Reversible?: " + (Bytes.compareTo(original, converted) == 0));
> }
> Output:
> -------
> Original: [92, 120, 65, 68]
> Converted: [-83]
> Reversible?: false
> {code}
> The "\" character needs to be treated as special and must be encoded as a non-printable
character ("\x5C") to avoid any kind of unambiguity during conversion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message