hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj K (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6774) Namenode is not able to recover from disk full condition
Date Fri, 07 Jan 2011 13:55:53 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978779#action_12978779
] 

Devaraj K commented on HADOOP-6774:
-----------------------------------

          When the disk becomes full, name node file system (fsimage, edits) is getting corrupted
and also name node is getting shutdown. When we try to restart, name node is not starting
because the name node file system is corrupted. 

This can be avoid this way,

        We can implement a daemon to monitor the disk usage for periodically and if the disk
usage reaches the threshold value, put the name node into Safe mode so that no modification
to file system will occur. Once the disk usage reaches below the threshold, name node will
be put out of the safe mode. 


 Please suggest if any body has any other opinions/suggestions.


> Namenode is not able to recover from disk full condition
> --------------------------------------------------------
>
>                 Key: HADOOP-6774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6774
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.20.2
>         Environment: Linux sjc9-flash-grid00.ciq.com 2.6.18-164.el5 #1 SMP Thu Sep 3
03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Ted Yu
>         Attachments: hadoop-6774.stack
>
>
> We ran an internal flow which resulted in:
> Exception in thread "main" java.lang.RuntimeException: initialization of flow executor
failed
> After that we freed disk space on the Namenode server, but restarting Namenode failed.
> Here is from Namenode log:
> 2010-05-19 17:15:15,514 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode
up at: sjc1-qa-certiq1.sjc1.ciq.com/10.201.8.247:9000
> 2010-05-19 17:15:15,516 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM
Metrics with processName=NameNode, sessionId=null
> 2010-05-19 17:15:15,518 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2010-05-19 17:15:15,579 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> 2010-05-19 17:15:15,579 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2010-05-19 17:15:15,579 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
> 2010-05-19 17:15:15,588 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2010-05-19 17:15:15,590 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
> 2010-05-19 17:15:15,637 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of
files = 1874
> 2010-05-19 17:15:16,202 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of
files under construction = 2
> 2010-05-19 17:15:16,204 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file
of size 259450 loaded in 0 seconds.
> 2010-05-19 17:15:16,599 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException:
For input string: ""
>     at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>     at java.lang.Long.parseLong(Long.java:431)
>     at java.lang.Long.parseLong(Long.java:468)
>     at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
>     at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:656)
>     at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:999)
>     at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>     at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>     at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:293)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1004)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1013)
> 2010-05-19 17:15:16,599 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message