hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBASE-3234 and bad datanode error
Date Fri, 28 Jan 2011 18:47:53 GMT
I renamed cdh3b2 jar hadoop-core-0.20.2+320jar and named
hadoop-core-0.20-append-r1056497.jar as hadoop-core-0.20.2+320.jar

Thanks

On Fri, Jan 28, 2011 at 10:42 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> If you want to try to debug your issue, I suggest you take a look at
> those datanode logs. For example, using the data from your first
> email, here's what I would do:
>
> The first exception is
> WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block blk_8207645655823156697_2836871
> java.io.IOException: Bad response 1 for block
> blk_8207645655823156697_2836871 from datanode
> 10.202.50.71:50010
>
> Why was there a bad response? What was going on on that node
> 10.202.50.71 at that same time? If I grepped for that block id, what
> would I see?
>
> Then I see
>
> INFO [2011-01-24 15:27:39] (ExecUtil.java:258) - 2011-01-24 23:17:03,252
> INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /
> 10.202.50.78:50020. Already tried 0 time(s).
>
> It would seem it's not able to connect to 10.202.50.78, why? What was
> going on in the logs?
>
> Finally, regarding the jar mangling you did, did you just renamed the
> old jars or you moved them aside?
>
> J-D
>
> On Fri, Jan 28, 2011 at 10:01 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > Last night, four reduce tasks failed with 'All datanodes .. are bad'
> > although no region server came down.
> >
> > I wanted to try out hadoop-core-0.20-append-r1056497.jar and at the same
> > time preserve data on hdfs.
> >
> > I renamed hadoop-core-0.20.2+320.jar on all nodes under $HADOOP_HOME and
> > copied hadoop-core-0.20-append-r1056497.jar to $HADOOP_HOME
> >
> > After restarting hadoop, jobtracker.jsp gave me Error 503
> > dfshealth.jsp is accessible and shows all data nodes.
> > I verified that namenode is out of safemode.
> >
> > Here is tail of jobtracker log: http://pastebin.com/2sJv07wy
> >
> > Here is tail of namenode log: http://pastebin.com/M5nv2fEy
> >
> > Here is stack trace for job tracker: http://pastebin.com/xhadk1YA
> >
> > Here is jstack for namenode: http://pastebin.com/0CmE4qkV
> >
> > Since hadoop-core-0.20-append-r1056497.jar came with 0.90 release, I want
> to
> > get some opinion here before posting elsewhere.
> > Hopefully someone would recommend the correct upgrade procedure.
> >
> > Thanks
> >
> > Here is fsck output:
> > Status: HEALTHY
> >  Total size:    1775777698618 B (Total open files size: 866 B)
> >  Total dirs:    28384
> >  Total files:   306547 (Files currently being written: 8)
> >  Total blocks (validated):      312296 (avg. block size 5686200 B) (Total
> > open file blocks (not validated): 4)
> >  Minimally replicated blocks:   312296 (100.0 %)
> >  Over-replicated blocks:        1 (3.2020904E-4 %)
> >  Under-replicated blocks:       0 (0.0 %)
> >  Mis-replicated blocks:         0 (0.0 %)
> >  Default replication factor:    3
> >  Average block replication:     3.000003
> >  Corrupt blocks:                0
> >  Missing replicas:              0 (0.0 %)
> >  Number of data-nodes:          7
> >  Number of racks:               1
> >
> >
> > The filesystem under path '/' is HEALTHY
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message