hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stuart Smith (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1169) Can't read binary data off HDFS via thrift API
Date Thu, 26 Aug 2010 22:44:55 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903118#action_12903118
] 

Stuart Smith commented on HDFS-1169:
------------------------------------

I think I know enough to make this change and do some unit testing, but I need a little java
guidance (on building everything).

Mainly, I need help on compiling the hadoopthriftapi.jar file from the gen-java files.

I actually really need this for my own uses.

My first take outlined below starts with just to converting the read/write methods to use
binary (vs adding new methods). This way I don't have to worry about making sure the correct
read/write methods are called in the initial version.

I re-generated the thrift java files with a new thrift interface the reads/writes in binary.

- note that binary data is converted to UTF-8 on write as well as read, so if you just update
the thrift client to write binary, the server will add unicode escape characters before it's
even saved to hdfs.

The code in:

hadoop-0.20.2/src/contrib/thriftfs/src/java/org/apache/hadoop/thriftfs/HadoopThriftServer.java

is straightforward as well.

However! this implements the interface defined in:

org.apache.hadoop.thriftfs.api.ThriftHadoopFileSystem.Iface

And even though I update the source in:

hadoop-0.20.2/src/contrib/thriftfs/gen-java

I get an error about overriding the read/write methods incorrectly, so it appears to be pulling
the definition of the

org.apache.hadoop.thriftfs.api.ThriftHadoopFileSystem.Iface

from hadoopthriftapi.jar (which makes sense).

However, I don't know how to rebuild hadoopthriftapi.jar.

I'll attach the thrift file and the HadoopThriftServer.java file a little later, but I just
wanted to get this comment up - maybe someone can give me simple instructions on how to build
hadoopthriftapi.jar from the gen-java files?


> Can't read binary data off HDFS via thrift API
> ----------------------------------------------
>
>                 Key: HDFS-1169
>                 URL: https://issues.apache.org/jira/browse/HDFS-1169
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: contrib/thriftfs
>    Affects Versions: 0.20.2
>            Reporter: Erik Forsberg
>
> Trying to access binary data stored in HDFS (in my case, TypedByte files generated by
Dumbo) via thrift talking to org.apache.hadoop.thriftfs.HadoopThriftServer, the data I get
back is mangled. For example, when I read a file which contains the value 0xa2, it's coming
back as 0xef 0xbf 0xbd, also known as the Unicode replacement character.
> I think this is because the read method in HadoopThriftServer.java is trying to convert
the data read from HDFS into UTF-8 via the String() constructor. 
> This essentially makes the HDFS thrift API useless for me :-(.
> Not being an expert on Thrift, but would it be possible to modify the API so that it
uses the binary type listed on http://wiki.apache.org/thrift/ThriftTypes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message