hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBASE-3234 and bad datanode error
Date Fri, 28 Jan 2011 18:42:42 GMT
If you want to try to debug your issue, I suggest you take a look at
those datanode logs. For example, using the data from your first
email, here's what I would do:

The first exception is
WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor
exception  for block blk_8207645655823156697_2836871
java.io.IOException: Bad response 1 for block
blk_8207645655823156697_2836871 from datanode
10.202.50.71:50010

Why was there a bad response? What was going on on that node
10.202.50.71 at that same time? If I grepped for that block id, what
would I see?

Then I see

INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - 2011-01-24 23:17:03,252
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /
10.202.50.78:50020. Already tried 0 time(s).

It would seem it's not able to connect to 10.202.50.78, why? What was
going on in the logs?

Finally, regarding the jar mangling you did, did you just renamed the
old jars or you moved them aside?

J-D

On Fri, Jan 28, 2011 at 10:01 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> Last night, four reduce tasks failed with 'All datanodes .. are bad'
> although no region server came down.
>
> I wanted to try out hadoop-core-0.20-append-r1056497.jar and at the same
> time preserve data on hdfs.
>
> I renamed hadoop-core-0.20.2+320.jar on all nodes under $HADOOP_HOME and
> copied hadoop-core-0.20-append-r1056497.jar to $HADOOP_HOME
>
> After restarting hadoop, jobtracker.jsp gave me Error 503
> dfshealth.jsp is accessible and shows all data nodes.
> I verified that namenode is out of safemode.
>
> Here is tail of jobtracker log: http://pastebin.com/2sJv07wy
>
> Here is tail of namenode log: http://pastebin.com/M5nv2fEy
>
> Here is stack trace for job tracker: http://pastebin.com/xhadk1YA
>
> Here is jstack for namenode: http://pastebin.com/0CmE4qkV
>
> Since hadoop-core-0.20-append-r1056497.jar came with 0.90 release, I want to
> get some opinion here before posting elsewhere.
> Hopefully someone would recommend the correct upgrade procedure.
>
> Thanks
>
> Here is fsck output:
> Status: HEALTHY
>  Total size:    1775777698618 B (Total open files size: 866 B)
>  Total dirs:    28384
>  Total files:   306547 (Files currently being written: 8)
>  Total blocks (validated):      312296 (avg. block size 5686200 B) (Total
> open file blocks (not validated): 4)
>  Minimally replicated blocks:   312296 (100.0 %)
>  Over-replicated blocks:        1 (3.2020904E-4 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.000003
>  Corrupt blocks:                0
>  Missing replicas:              0 (0.0 %)
>  Number of data-nodes:          7
>  Number of racks:               1
>
>
> The filesystem under path '/' is HEALTHY
>

Mime
View raw message