hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitalii Tymchyshyn (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2084) Sometimes backup node/secondary name node stops with exception
Date Tue, 21 Jun 2011 16:03:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052635#comment-13052635
] 

Vitalii Tymchyshyn commented on HDFS-2084:
------------------------------------------

Some more details:
I am working on jobs, each produce multiple tasks. ZooKeeper is used to spread tasks over
the computing cluster. 
Tasks inside single job communicate (produce result files that are used on start by another
tasks) with HDFS.
The job has main control process. Before the job starts it creates directory to put communication
files into and original job arguments.
After job is finished, it removes the directory with single call.
The problem was reproduced today with a number of communication files of a single job.
Note that while job is deleted when all the tasks is done, there may be some "hanging" tasks
that did timeouted and was restarted. In this case, unhang may lead to access try to file
that do not exists. 
I will try to check with logs if this was the case.


> Sometimes backup node/secondary name node stops with exception
> --------------------------------------------------------------
>
>                 Key: HDFS-2084
>                 URL: https://issues.apache.org/jira/browse/HDFS-2084
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.21.0
>         Environment: FreeBSD
>            Reporter: Vitalii Tymchyshyn
>         Attachments: patch.diff
>
>
> 2011-06-17 11:43:23,096 ERROR org.apache.hadoop.hdfs.server.namenode.Checkpointer: Throwable
Exception in doCheckpoint: 
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
>         at org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:158)
>         at org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:243)
>         at org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message