Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 30815 invoked from network); 1 Apr 2009 22:54:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Apr 2009 22:54:37 -0000 Received: (qmail 52322 invoked by uid 500); 1 Apr 2009 22:54:36 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 52244 invoked by uid 500); 1 Apr 2009 22:54:36 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 52234 invoked by uid 99); 1 Apr 2009 22:54:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2009 22:54:36 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2009 22:54:34 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 18652234C051 for ; Wed, 1 Apr 2009 15:54:13 -0700 (PDT) Message-ID: <1641171654.1238626453098.JavaMail.jira@brutus> Date: Wed, 1 Apr 2009 15:54:13 -0700 (PDT) From: "Boris Shkolnik (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4045) Increment checkpoint if we see failures in rollEdits In-Reply-To: <153989470.1219992524637.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694814#action_12694814 ] Boris Shkolnik commented on HADOOP-4045: ---------------------------------------- 1. FSImage.setCheckpointTime() variable al is not used. bq. fixed 2. processIOError(ArrayList sds) may be eliminated. bq. This will force using two-argument version of the function everywhere, in most cases with "true" value for the second argument. 3. I would also get rid of processIOError(ArrayList errorStreams). The point is that it is better to have only one processIOError in each class, otherwise it can get as bad as it is now with all different variants of it. If you think it is a lot of changes, then lets at least make both of them private. bq. see 2. 4. Do we want to make removedStorageDirs a map in order to avoid adding the same directory twice into it or does it never happen? bq. good idea. will need a separate JIRA for it 5. Same with Storage.storageDirs. If we search in a collection then we might want to use searchable collections. This may be done in a separate issue. bq. same as 4. 6. It's somewhat confusing: FSImage.processIOError() calls editLog.processIOError() and then FSEditLog.processIOError() calls fsimage.processIOError(). Is it going to converge at some point? bq. it should. every time processIOError calles its counterpart in the other class it passes _false_ as second (propagate) argument to make sure it will not call the original function. 7. setCheckpointTime() ignores io errors. Just mentioning this, I don't see how to avoid it. Failed streams/directories will be remove next time flushAndSync() called. bq. Yes, it should be cought elsewhere. > Increment checkpoint if we see failures in rollEdits > ---------------------------------------------------- > > Key: HADOOP-4045 > URL: https://issues.apache.org/jira/browse/HADOOP-4045 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Reporter: Lohit Vijayarenu > Assignee: Boris Shkolnik > Priority: Critical > Fix For: 0.19.2 > > Attachments: HADOOP-4045-1.patch, HADOOP-4045.patch > > > In _FSEditLog::rollEdits_, if we encounter an error during opening edits.new, we remove the store directory associated with it. At this point we should also increment checkpoint on all other directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.