hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4045) Increment checkpoint if we see failures in rollEdits
Date Thu, 05 Feb 2009 20:09:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670883#action_12670883

Konstantin Shvachko commented on HADOOP-4045:

Some more requirements for {{processIOError()}}
# {{FSEdits.processIOError()}} should always unlock the storage being removed from service.
This is necessary because if only the edits file causes the problem the directory will still
be locked when we try to reuse it later as proposed in HADOOP-4885.
# It should always call {{incrementCheckpointTime()}}. Otherwise the abandoned directories
may mistakenly be used for loading the latest image/edits.
#  {{incrementCheckpointTime()}} should not recursively call {{processIOError()}}. But we
should be able to handle failures of multiple edits directories.
# It should close the {{EditsOutputStream}}.
# All the logic with adding lost edits dirs to {{removedStorageDirs}} should be in {{processIOError()}}.
# {{FSEdits.processIOError(int)}} should be eliminated completely.
# It seems to me that the most appropriate prototype for processIOError would be
FSEdits.processIOError(ArrayList<StorageDirectory> errorDirs)
This might even let us have only 1 such method rather than two.

> Increment checkpoint if we see failures in rollEdits
> ----------------------------------------------------
>                 Key: HADOOP-4045
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4045
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.19.1
> In _FSEditLog::rollEdits_, if we encounter an error during opening edits.new, we remove
 the store directory associated with it. At this point we should also increment checkpoint
on all other directories.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message