hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3754) Support a Thrift Interface to access files/directories in HDFS
Date Wed, 06 Aug 2008 07:48:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620164#action_12620164
] 

dhruba borthakur commented on HADOOP-3754:
------------------------------------------

Thanks Pete & Nitay for the detailed comments. Thanks a bunch.

1. The patch includes the thrift binary for Linux. See lib/thrift/thrift and lib/thrift/libthrift.jar.
Thus, a Linux compile does not have to download any external libraries, utilities.

2. The proxy server uses the message from the hadoop.IOException to create its own exception.
This is the best we can do for now. If we want to improve it later, we can do that. The application
would see the real exception string, so it shoudl be enough for debugging purposes, won't
it?

3. Added a note to chown to say that it is not-atomic. This is true for hdfs.py only and does
not apply to the chown thrift interface.

4. I like your idea of using the checksum all the way from the client, but maybe we can postpone
it to a later date.

5. The python command line needs more work. However, I am not targeting the python wrapper
as a piece that an application will use as it is. It is there to demonstrate how to access
HDFS from a python script. I

6. Added README that describes the approach, build and deployment process. I plan on writing
a Wiki page once this patch gets committed.

7. performance measurement will come at a later date

8. Added default minimum number of threads to be 10.

9. The change to build-contrib.xml ensures that the generated jar file(s) are in the CLASSPATH
while compiling HadoopThriftServer.java.

10. I would wait to include fb303. This is mostly for statistics management and process management
and can be added at a later date. It might be useful to use HadoopMetrics or via HADOOP-3772.


11. I added a new call setInactiveTimeoutPeriod() that allows an application to specify how
long the proxy server should remain active starting from the last call to it. If this timer
expires, then the proxy server closes all open files and shuts down. The default inactivity
timeout is 1 hour. This does not completely address Nitay's problems, but maybe solves it
to a certain extent. If Nitay could merge in his code for per-handle timer once this patch
is committed, that will be great.

12. If, at a future time, we add Thrift APIs to Namenode, Datanode, etc, they would have to
be located in src/hdfs and not in contrib.  Even if we decide to keep them in contrib, they
could be src/contrib/thriftfs/namenode, src/contrib/thriftfs/datanode, etc. I think the API
in this patch should try to resemble existing API in fs.FileSystem.

13. I added a getFileBlockLocations API to allow fetching the block locations of a file.


> Support a Thrift Interface to access files/directories in HDFS
> --------------------------------------------------------------
>
>                 Key: HADOOP-3754
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3754
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: hadoopthrift2.patch, hadoopthrift3.patch, thrift1.patch
>
>
> Thrift is a cross-language RPC framework. It supports automatic code generation for a
variety of languages (Java, C++, python, PHP, etc) It would be nice if HDFS APIs are exposed
though Thirft. It will allow applications written in any programming language to access HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message