hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Wyckoff (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3754) Support a Thrift Interface to access files/directories in HDFS
Date Tue, 05 Aug 2008 16:30:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619947#action_12619947
] 

Pete Wyckoff commented on HADOOP-3754:
--------------------------------------

+1

but, a few more nits:

1. I do think that requiring people to download and compile thrift will be too much of a hassle
given that the compiler is in C++ so checking in the generated code, really is the way to
go - I think :)  And of course, this requires checking in the needed libraries in various
langauges - the libthrift,.jar, thrift.so, thrift.py, ...  But, we can require it, it just
makes it more of a hassle for the user, but in this case, I think we need to have a README
that tells people how to do that. Also, why do we need to check in the limited_relection header
if the user has to download thrift??

2. The exceptions thrown by the library are very general and do not match the client lib -
e.g., IOException, ... although this could be a later add on.

3. A note saying the chown is not atomic - i.e., the group in theory could change between
the get and the set

4. I think copy from local would be more robust if one could optionally add a checksum so
the server could ensure it's looking at the right file and if not and/or the path does not
exist, a meaningful exception is thrown but again could be a later add on

5. Not needed now, but the command line isn't very robust to errors or friendly about printing
them out in a meaningful user friends way.

6. Generally a README that explains what this is and/or a bigger release note.

7. Not now, but I would be super, super interested in knowing the performance of read/writes
from this server.

8. as we saw with the metastore, it would be cool to have an optional #of minimum threads
in the worker pool.

9. I don't quite understand why src/contrib/build-contrib.xml needs to change for adding this??

10. would be better to inherit from thrift/src/contrib/fb303 but could be done later and then
include counts for each operation. 

But, this is a killer application since no Java or Hadoop is needed on the client whatsoever!
Congratulations! Would be cool even to use the Java bindings from a thin client to show no
need for all of hadoop.

I would really, really love to see:

List<BlockAddresses> readBlocks(string filename) throws IOException ;
List<BlockAddresses> writeBlocks(string filename, i64 length) throws IOException;

which give you access to reading/writing directly from the data node over TCP :)

Overall looks very good on the first cut.

pete



> Support a Thrift Interface to access files/directories in HDFS
> --------------------------------------------------------------
>
>                 Key: HADOOP-3754
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3754
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: hadoopthrift2.patch, hadoopthrift3.patch, thrift1.patch
>
>
> Thrift is a cross-language RPC framework. It supports automatic code generation for a
variety of languages (Java, C++, python, PHP, etc) It would be nice if HDFS APIs are exposed
though Thirft. It will allow applications written in any programming language to access HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message