hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1801) Remove use of timestamps to identify checkpoints and logs
Date Fri, 29 Apr 2011 06:13:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026872#comment-13026872
] 

Todd Lipcon commented on HDFS-1801:
-----------------------------------

The previous patch attached here broke TestNameEditsConfig due to the following situation:

Imagine there is one image dir /image/1 and two edits dirs /edits/1 and /edits/2
You have the following sequence:
- Start new NN
- Write some edits which go to both edits dirs
- /edits/1 fails
- Write some more edits, now going only to /edits/2
- NN crashes
- /edits/1 is recovered but /edits/2 goes offline
- NN restarts.

It used to be we could distinguish this by the fstime, which we incremented on failures. time
is an arbitrary measure, and now that we have txids, it's better to record the txid.

The new version of this patch still gets rid of fstime, but creates a new file called {{seen_txid}}
which occasionally is re-written with the current txid. It's currently getting rewritten on
failure and roll, but could also be triggered after some number of transactions.

On startup, the NN will look across all configured directories and see the maximum txid stored
in {{seen_txid}}. If it can't find edits that include this txid, it will refuse to start.

This addition makes TestNameEditsConfig pass again.

> Remove use of timestamps to identify checkpoints and logs
> ---------------------------------------------------------
>
>                 Key: HDFS-1801
>                 URL: https://issues.apache.org/jira/browse/HDFS-1801
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: hdfs-1801.txt, hdfs-1801.txt
>
>
> Currently, the NameNode validates checkpoint uploads by using timestamps associated with
checkpoints and edit logs. However, now that we have transaction IDs that uniquely identify
each point in time in the history of a namespace, it is more robust to simply use transaction
IDs to identify images and edits.
> This JIRA is to remove the use of editsTime and checkpointTime and replace it with:
> * {{lastCheckpointTxId}} - the highest transaction ID reflected in the most recently
saved fsimage file
> * {{lastLogRollTxId}} - the highest transaction ID in {{edits}} when {{rollFsImage}}
was called by the checkpointing node.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message