hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Forsberg (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1169) Can't read binary data off HDFS
Date Fri, 21 May 2010 09:45:16 GMT
Can't read binary data off HDFS
-------------------------------

                 Key: HDFS-1169
                 URL: https://issues.apache.org/jira/browse/HDFS-1169
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: contrib/thriftfs
    Affects Versions: 0.20.2
            Reporter: Erik Forsberg


Trying to access binary data stored in HDFS (in my case, TypedByte files generated by Dumbo)
via thrift talking to org.apache.hadoop.thriftfs.HadoopThriftServer, the data I get back is
mangled. For example, when I read a file which contains the value 0xa2, it's coming back as
0xef 0xbf 0xbd, also known as the Unicode replacement character.

I think this is because the read method in HadoopThriftServer.java is trying to convert the
data read from HDFS into UTF-8 via the String() constructor. 

This essentially makes the HDFS thrift API useless for me :-(.

Not being an expert on Thrift, but would it be possible to modify the API so that it uses
the binary type listed on http://wiki.apache.org/thrift/ThriftTypes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message