hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Wyckoff (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3754) Support a Thrift Interface to access files/directories in HDFS
Date Tue, 05 Aug 2008 16:30:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619947#action_12619947

Pete Wyckoff commented on HADOOP-3754:


but, a few more nits:

1. I do think that requiring people to download and compile thrift will be too much of a hassle
given that the compiler is in C++ so checking in the generated code, really is the way to
go - I think :)  And of course, this requires checking in the needed libraries in various
langauges - the libthrift,.jar, thrift.so, thrift.py, ...  But, we can require it, it just
makes it more of a hassle for the user, but in this case, I think we need to have a README
that tells people how to do that. Also, why do we need to check in the limited_relection header
if the user has to download thrift??

2. The exceptions thrown by the library are very general and do not match the client lib -
e.g., IOException, ... although this could be a later add on.

3. A note saying the chown is not atomic - i.e., the group in theory could change between
the get and the set

4. I think copy from local would be more robust if one could optionally add a checksum so
the server could ensure it's looking at the right file and if not and/or the path does not
exist, a meaningful exception is thrown but again could be a later add on

5. Not needed now, but the command line isn't very robust to errors or friendly about printing
them out in a meaningful user friends way.

6. Generally a README that explains what this is and/or a bigger release note.

7. Not now, but I would be super, super interested in knowing the performance of read/writes
from this server.

8. as we saw with the metastore, it would be cool to have an optional #of minimum threads
in the worker pool.

9. I don't quite understand why src/contrib/build-contrib.xml needs to change for adding this??

10. would be better to inherit from thrift/src/contrib/fb303 but could be done later and then
include counts for each operation. 

But, this is a killer application since no Java or Hadoop is needed on the client whatsoever!
Congratulations! Would be cool even to use the Java bindings from a thin client to show no
need for all of hadoop.

I would really, really love to see:

List<BlockAddresses> readBlocks(string filename) throws IOException ;
List<BlockAddresses> writeBlocks(string filename, i64 length) throws IOException;

which give you access to reading/writing directly from the data node over TCP :)

Overall looks very good on the first cut.


> Support a Thrift Interface to access files/directories in HDFS
> --------------------------------------------------------------
>                 Key: HADOOP-3754
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3754
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: hadoopthrift2.patch, hadoopthrift3.patch, thrift1.patch
> Thrift is a cross-language RPC framework. It supports automatic code generation for a
variety of languages (Java, C++, python, PHP, etc) It would be nice if HDFS APIs are exposed
though Thirft. It will allow applications written in any programming language to access HDFS.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message