hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stuart Smith (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1169) Can't read binary data off HDFS via thrift API
Date Fri, 27 Aug 2010 01:40:54 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903178#action_12903178
] 

Stuart Smith commented on HDFS-1169:
------------------------------------

oy. didn't format the code, sorry:

{noformat} 
    /**
     * write to a file
     */
    public boolean write(ThriftHandle tout, String encodedData) throws ThriftIOException {
      try {
        now = now();
        HadoopThriftHandler.LOG.debug("write: " + tout.id);
        FSDataOutputStream out = (FSDataOutputStream)lookup(tout.id);
        Base64 base64 = new Base64();
        byte[] tmp = null;
        tmp = (byte[])base64.decode( (byte[]) encodedData.getBytes("UTF-8") );
            
        out.write(tmp, 0, tmp.length);
        HadoopThriftHandler.LOG.debug("wrote: " + tout.id);
        return true;
      } catch (IOException e) {
        throw new ThriftIOException(e.getMessage());
      }
    }

    /**
     * read from a file
     */
    public String read(ThriftHandle tout, long offset,
                       int length) throws ThriftIOException {
      try {
        now = now();
        HadoopThriftHandler.LOG.debug("read: " + tout.id +
                                     " offset: " + offset +
                                     " length: " + length);
        FSDataInputStream in = (FSDataInputStream)lookup(tout.id);
        if (in.getPos() != offset) {
          in.seek(offset);
        }
        byte[] tmp = new byte[length];
        int numbytes = in.read(offset, tmp, 0, length);
        HadoopThriftHandler.LOG.debug("read done: " + tout.id);
        try
        {
            Base64 base64 = new Base64();
            return new String( (byte[])base64.encode( (Object)tmp ), "UTF-8");
        }
        catch( EncoderException e )
        {
            e.printStackTrace();
            System.exit(0);
            return "";
        }
      } catch (IOException e) {
        throw new ThriftIOException(e.getMessage());
      }
    }

{noformat} 

> Can't read binary data off HDFS via thrift API
> ----------------------------------------------
>
>                 Key: HDFS-1169
>                 URL: https://issues.apache.org/jira/browse/HDFS-1169
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: contrib/thriftfs
>    Affects Versions: 0.20.2
>            Reporter: Erik Forsberg
>         Attachments: hadoopfs.thrift, HadoopThriftServer.java
>
>
> Trying to access binary data stored in HDFS (in my case, TypedByte files generated by
Dumbo) via thrift talking to org.apache.hadoop.thriftfs.HadoopThriftServer, the data I get
back is mangled. For example, when I read a file which contains the value 0xa2, it's coming
back as 0xef 0xbf 0xbd, also known as the Unicode replacement character.
> I think this is because the read method in HadoopThriftServer.java is trying to convert
the data read from HDFS into UTF-8 via the String() constructor. 
> This essentially makes the HDFS thrift API useless for me :-(.
> Not being an expert on Thrift, but would it be possible to modify the API so that it
uses the binary type listed on http://wiki.apache.org/thrift/ThriftTypes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message