hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jameson Li <hovlj...@gmail.com>
Subject Re: my hadoop cluster namenode crashed after modifying the timestamp in some of the nodes
Date Tue, 15 Feb 2011 03:50:26 GMT
Hi Todd,

Thanks very much. I think you are really right.

I had used the hadoop-0.20-append patchs that is mentioned here:
http://github.com/lenn0x/Hadoop-Append

After reading the
patch:0002-HDFS-278.patch<https://github.com/lenn0x/Hadoop-Append/blob/master/0002-HDFS-278.patch>
,
I found that the file "src/hdfs/org/apache/hadoop/hdfs/DFSClient.java" in my
cluster does not contain these lines:

*
this.maxBlockAcquireFailures =
                           conf.getInt("dfs.client.max.block.acquire.failures",
                                       MAX_BLOCK_ACQUIRE_FAILURES);
*


It just looks like this:
  * this.maxBlockAcquireFailures = getMaxBlockAcquireFailures(conf);*

So I changed the
0002-HDFS-278.patch<https://github.com/lenn0x/Hadoop-Append/blob/master/0002-HDFS-278.patch>
,
and the diff between the origin
0002-HDFS-278.patch<https://github.com/lenn0x/Hadoop-Append/blob/master/0002-HDFS-278.patch>
and
the new patch after my change is:
*diff 0002-HDFS-278.patch ../hadoop-new/patch-origion/0002-HDFS-278.patch *
*0a1,10*
*> From 56463073cf051f1e11b4d3921542979e53daead4 Mon Sep 17 00:00:00 2001*
*> From: Chris Goffinet <cg@chrisgoffinet.com>*
*> Date: Mon, 20 Jul 2009 17:20:13 -0700*
*> Subject: [PATCH 2/4] HDFS-278*
*> *
*> ---*
*>  src/hdfs/org/apache/hadoop/hdfs/DFSClient.java |   70
++++++++++++++++++++++--*
*>  1 files changed, 64 insertions(+), 6 deletions(-)*
*> *
*> diff --git a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
*2,3c12,13*
*< --- src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
*< +++ src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
*---*
*> --- a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
*> +++ b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java*
*19,20c29,32*
*< @@ -188,5 +192,7 @@ public class DFSClient implements FSConstants,
java.io.Closeable {*
*<      this.maxBlockAcquireFailures = getMaxBlockAcquireFailures(conf);*
*---*
*> @@ -167,7 +171,9 @@ public class DFSClient implements FSConstants,
java.io.Closeable {*
*>      this.maxBlockAcquireFailures = *
*>
conf.getInt("dfs.client.max.block.acquire.failures",*
*>                                        MAX_BLOCK_ACQUIRE_FAILURES);*
*118a131,133*
*> -- *
*> 1.6.3.1*
*> *

Did I miss some of the patchs about hadoop-0.20-append?
How could I recover my NN and  let it work that I can export the data?

2011/2/14 Todd Lipcon <todd@cloudera.com>

> Hi Jameson,
>
> My first instinct is that you have an incomplete patch series for hdfs
> append, and that's what caused your problem. There were many bug fixes along
> the way for hadoop-0.20-append and maybe you've missed some in your manually
> patched build.
>
> -Todd
>
>
> On Mon, Feb 14, 2011 at 5:49 AM, Jameson Li <hovlj.ei@gmail.com> wrote:
>
>> Hi ,
>>
>> My hadoop version is basic on hadoop 0.20.2 realase, patched
>> HADOOP-4675,5745,MAPREDUCE-1070,551,1089 (support
>> ganglia31,fairscheduler preemption,hdfs append), and patched
>> HADOOP-6099,HDFS-278,Patches-from-Dhruba-Borthakur,HDFS-200 (support
>> scribe).
>>
>> Last Friday I found that some of my test hadoop cluster nodes's time
>> is not in the normal state, they are some number of hours beyond the
>> normal time.
>> So I run the next command, and add it to the crontab job.
>> /usr/bin/rdate -s time-b.nist.gov
>>
>> And then my hadoop cluster namenode crashed, after my restarting the
>> namenode.
>> And I don't know whether it is relationed by modifying the time.
>>
>> The error log:
>> 2011-02-12 18:44:46,603 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
>> blocks = 196
>> 2011-02-12 18:44:46,603 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
>> blocks = 0
>> 2011-02-12 18:44:46,603 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>> under-replicated blocks = 29
>> 2011-02-12 18:44:46,603 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>> over-replicated blocks = 41
>> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
>> STATE* Leaving safe mode after 69 secs.
>> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
>> STATE* Safe mode is OFF.
>> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
>> STATE* Network topology has 1 racks and 5 datanodes
>> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange:
>> STATE* UnderReplicatedBlocks has 29 blocks
>> 2011-02-12 18:44:46,886 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* ask 192.168.1.14:50010 to replicate
>> blk_-8806907658071633346_1750 to datanode(s) 192.168.1.83:50010
>> 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* ask 192.168.1.83:50010 to replicate
>> blk_-7689075547598626554_1800 to datanode(s) 192.168.1.10:50010
>> 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* ask 192.168.1.84:50010 to replicate
>> blk_-7587424527299099175_1717 to datanode(s) 192.168.1.10:50010
>> 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* ask 192.168.1.84:50010 to replicate
>> blk_-6925943363757944243_1909 to datanode(s) 192.168.1.13:50010
>> 2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* ask 192.168.1.14:50010 to replicate
>> blk_-6835423500788375545_1928 to datanode(s) 192.168.1.10:50010
>> 2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* ask 192.168.1.83:50010 to replicate
>> blk_-6477488774631498652_1742 to datanode(s) 192.168.1.84:50010
>> 2011-02-12 18:44:46,889 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> ReplicationMonitor thread received Runtime exception.
>> java.lang.IllegalStateException: generationStamp (=1) ==
>> GenerationStamp.WILDCARD_STAMP java.lang.IllegalStateException:
>> generationStamp (=1) == GenerationStamp.WILDCARD_STAMP
>>         at
>> org.apache.hadoop.hdfs.protocol.Block.validateGenerationStamp(Block.java:148)
>>         at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:156)
>>         at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:30)
>>         at java.util.TreeMap.put(TreeMap.java:545)
>>         at java.util.TreeSet.add(TreeSet.java:238)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:284)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.invalidateWorkForOneNode(FSNamesystem.java:2743)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeInvalidateWork(FSNamesystem.java:2419)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2412)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2357)
>>         at java.lang.Thread.run(Thread.java:619)
>> 2011-02-12 18:44:46,892 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down NameNode at hadoop5/192.168.1.84
>> ************************************************************/
>>
>>
>> Thanks,
>> Jameson
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message