Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 68190 invoked from network); 13 Feb 2007 21:28:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Feb 2007 21:28:33 -0000 Received: (qmail 73317 invoked by uid 500); 13 Feb 2007 21:28:37 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 73266 invoked by uid 500); 13 Feb 2007 21:28:37 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 73210 invoked by uid 99); 13 Feb 2007 21:28:37 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Feb 2007 13:28:37 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Feb 2007 13:28:26 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D04CF7142E4 for ; Tue, 13 Feb 2007 13:28:05 -0800 (PST) Message-ID: <33443217.1171402085851.JavaMail.jira@brutus> Date: Tue, 13 Feb 2007 13:28:05 -0800 (PST) From: "Tom White (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-997) Implement S3 retry mechanism for failed block transfers In-Reply-To: <4074771.1170970505785.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472873 ] Tom White commented on HADOOP-997: ---------------------------------- > + Is it true that the Jets3tFileSystemStore#initialize runs after the retry proxy has been set up so initialization will be retried if an Exception? Is this intentional? Perhaps initialization should be outside the purview of retry. This was not intentional. I think initialization should probably not be retried currently (in the future I might be worth thinking through the initialization cases that need retrying). It looks like this might be the cause of the exception you found. > + Minor comment: In version 2 of the retry mechanism, you'd be able to narrow retries to happen only for certain exception types: e.g. for IOExceptions only rather than for Exception as its currently written. S3 should only retry for IOExceptions - I'll fix this. > + In S3OutputStream#newBackupFile, you replace File createTempFile with your own tmp file name fabricator. Any reason why you didn't stay using createTempFile, just use the override with signature instead: http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String,%20java.io.File)? (You do this in a few places in this patch). I did this for consistency with HDFS and also because of a recent issue with creating files in /tmp. I agree that createTempFile with the signature you suggest is the way to go. I'll change it. > + In S3FileSystem.java#createDefaultStore, should the retry params be exposed as configurables? (I'm fine if the rejoinder is that this makes for too much configuration). Yes, this crossed my mind and I think we should have it. It also occurred to me that with annotation driven retries (HADOOP-601) you wouldn't be able to expose the parameters as configurables (as far as I know). > + On initialize, should Jets3tFileSystemStore clean up its tmp directory? (If crash, deleteOnExit won't get a chance to run). Possibly. Although the files should be deleted after transfer anyway. The deleteOnExit was a backup, but you are right that it doesn't always run. > + If an IOException in retreiveBlock in Jets3tFileSystemStore, you do a closeQuietly on the out stream. Its followed by a finally that does closeQuietly on both the in and out stream. I don't follow whats going on here. Why not just leave all closing to the finally block? (Suggest adding a comment on why the special case handling of out stream in IOE block). On failure I want to delete the file, but to do this I close the stream first. I agree a better comment is needed. Thanks for the detailed feedback. I'll look at making improvements over the coming days. > Implement S3 retry mechanism for failed block transfers > ------------------------------------------------------- > > Key: HADOOP-997 > URL: https://issues.apache.org/jira/browse/HADOOP-997 > Project: Hadoop > Issue Type: Improvement > Components: fs > Affects Versions: 0.11.0 > Reporter: Tom White > Assigned To: Tom White > Attachments: HADOOP-997.patch > > > HADOOP-882 improves S3FileSystem so that when certain communications problems with S3 occur the operation is retried. However, the retry mechanism cannot handle a block transfer failure, since blocks may be very large and we don't want to buffer them in memory. This improvement is to write a wrapper (using java.lang.reflect.Proxy if possible - see discussion in HADOOP-882) that can retry block transfers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.