hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kelly (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1521) Persist transaction ID on disk between NN restarts
Date Wed, 23 Mar 2011 17:32:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010242#comment-13010242
] 

Ivan Kelly commented on HDFS-1521:
----------------------------------

I think this BackupNode failure is the result of a preexisting problem with BackupNode. The
problem is the timing with which OP_JSPOOL_START is arriving at the backup node. The sequence
is:

||    || NameNode             || BackupNode                      |
|  1. |                       | doCheckpoint                     |
|  2. | startCheckpoint       |                                  |
|  3. | log(OP_JSPOOL_START)  |                                  |
|  4. |                       | download images                  |
|  5. |                       | merge                            |
|  6. |                       | upload new image                 |
|  7. |                       | convergeJournalSpool             |
|  8. | flush editlog buffers |                                  |
|  9. |                       | startJournalSpool                |
|     |                       |                                  |
| ... | ...                   | ...                              |
|     |                       |                                  |
| 10. |                       | doCheckpoint                     |
| 11. | startCheckpoint       |                                  |
| 12. | log(OP_JSPOOL_START)  |                                  |
| 13. |                       | download images                  |
| 14. |                       | merge                            |
| 15. |                       | upload new image                 |
| 16. |                       | convergeJournalSpool (EXCEPTION) |

Basically, the OP_JSPOOL_START doesn't reach BackupNode before the checkpoint finishes, so
when it does arrive, a spool is created which is then converged during the next checkpoint,
but it contains all the transactions from the first checkpoint onwards. 

> Persist transaction ID on disk between NN restarts
> --------------------------------------------------
>
>                 Key: HDFS-1521
>                 URL: https://issues.apache.org/jira/browse/HDFS-1521
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.23.0
>
>         Attachments: FSImageFormat.patch, HDFS-1521.diff, HDFS-1521.diff, HDFS-1521.diff,
HDFS-1521.diff, HDFS-1521.diff, hdfs-1521.3.txt, hdfs-1521.4.txt, hdfs-1521.5.txt, hdfs-1521.txt,
hdfs-1521.txt, hdfs-1521.txt
>
>
> For HDFS-1073 and other future work, we'd like to have the concept of a transaction ID
that is persisted on disk with the image/edits. We already have this concept in the NameNode
but it resets to 0 on restart. We can also use this txid to replace the _checkpointTime_ field,
I believe.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message