Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 50527 invoked from network); 6 Aug 2008 07:49:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Aug 2008 07:49:12 -0000 Received: (qmail 22340 invoked by uid 500); 6 Aug 2008 07:49:04 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 22310 invoked by uid 500); 6 Aug 2008 07:49:04 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 22281 invoked by uid 99); 6 Aug 2008 07:49:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Aug 2008 00:49:04 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Aug 2008 07:48:17 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 582E5234C193 for ; Wed, 6 Aug 2008 00:48:44 -0700 (PDT) Message-ID: <662857095.1218008924360.JavaMail.jira@brutus> Date: Wed, 6 Aug 2008 00:48:44 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3754) Support a Thrift Interface to access files/directories in HDFS In-Reply-To: <551365883.1216024772895.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620164#action_12620164 ] dhruba borthakur commented on HADOOP-3754: ------------------------------------------ Thanks Pete & Nitay for the detailed comments. Thanks a bunch. 1. The patch includes the thrift binary for Linux. See lib/thrift/thrift and lib/thrift/libthrift.jar. Thus, a Linux compile does not have to download any external libraries, utilities. 2. The proxy server uses the message from the hadoop.IOException to create its own exception. This is the best we can do for now. If we want to improve it later, we can do that. The application would see the real exception string, so it shoudl be enough for debugging purposes, won't it? 3. Added a note to chown to say that it is not-atomic. This is true for hdfs.py only and does not apply to the chown thrift interface. 4. I like your idea of using the checksum all the way from the client, but maybe we can postpone it to a later date. 5. The python command line needs more work. However, I am not targeting the python wrapper as a piece that an application will use as it is. It is there to demonstrate how to access HDFS from a python script. I 6. Added README that describes the approach, build and deployment process. I plan on writing a Wiki page once this patch gets committed. 7. performance measurement will come at a later date 8. Added default minimum number of threads to be 10. 9. The change to build-contrib.xml ensures that the generated jar file(s) are in the CLASSPATH while compiling HadoopThriftServer.java. 10. I would wait to include fb303. This is mostly for statistics management and process management and can be added at a later date. It might be useful to use HadoopMetrics or via HADOOP-3772. 11. I added a new call setInactiveTimeoutPeriod() that allows an application to specify how long the proxy server should remain active starting from the last call to it. If this timer expires, then the proxy server closes all open files and shuts down. The default inactivity timeout is 1 hour. This does not completely address Nitay's problems, but maybe solves it to a certain extent. If Nitay could merge in his code for per-handle timer once this patch is committed, that will be great. 12. If, at a future time, we add Thrift APIs to Namenode, Datanode, etc, they would have to be located in src/hdfs and not in contrib. Even if we decide to keep them in contrib, they could be src/contrib/thriftfs/namenode, src/contrib/thriftfs/datanode, etc. I think the API in this patch should try to resemble existing API in fs.FileSystem. 13. I added a getFileBlockLocations API to allow fetching the block locations of a file. > Support a Thrift Interface to access files/directories in HDFS > -------------------------------------------------------------- > > Key: HADOOP-3754 > URL: https://issues.apache.org/jira/browse/HADOOP-3754 > Project: Hadoop Core > Issue Type: New Feature > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: hadoopthrift2.patch, hadoopthrift3.patch, thrift1.patch > > > Thrift is a cross-language RPC framework. It supports automatic code generation for a variety of languages (Java, C++, python, PHP, etc) It would be nice if HDFS APIs are exposed though Thirft. It will allow applications written in any programming language to access HDFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.