hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gordon Sommers (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-6883) Text.toString violates its abstraction
Date Tue, 27 Jul 2010 14:05:16 GMT
Text.toString violates its abstraction

                 Key: HADOOP-6883
                 URL: https://issues.apache.org/jira/browse/HADOOP-6883
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 0.20.1
         Environment: Linux
            Reporter: Gordon Sommers

I stumbled upon this when encoding a google protocol buffer in base64, and storing it in a
Text object for serialization. Compare the following two lines:

byte [] decoded = b64.decode(val.getBytes())
//this does not return the same bytes as below and the result, after decoding the base64 successfully,
is a very mangled protocol buffer

byte [] decoded = b64.decode(val.toString().getBytes());
//YES, toString() FIXES IT

Elsewhere in my code I also have: 
Text curline = new Text(values.next().toString());
byte [] raw = base64.decode(curline.getBytes());
//This does work.

It looks like the Text object must be toString'd (just once, somewhere, even if its later
repacked in a Text) before it will have the proper byte representation. I would classify this
as a leaky abstraction and ask that the reason please be isolated and the api fixed somehow
so that other developers dont have to spend 3 days figuring out when Text.getBytes isn't returning
the right bytes even though Text.toString prints exactly the right string representation and
Text.toString.getBytes does return the right bytes.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message