hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Angeles <patr...@cloudera.com>
Subject Re: How could I make sure the famous "xceiver" parameters works in the data node?
Date Fri, 13 May 2011 13:56:04 GMT
Hey Stanley,

What JD was trying to say is, you are not hitting the xceiver limit. You
have a different problem.

(If you hit the Xceiver limit, you will get a message like "xceiverCount 258
exceeds the limit of concurrent *xcievers* 256" in the logs.)

There's not enough information in the logs that you posted to get to the
root of the problem. But, it appears that you have some block corruption on
the datanodes. Run "hadoop fsck /" to get some diagnostics.


On Fri, May 13, 2011 at 5:53 AM, Stanley Xu <wenhao.xu@gmail.com> wrote:

> Hi Jean,
>
> Sorry, I am not a native speaker, what do you mean for "The error message
> in
> the datanode log is pretty obvious about the
> config if you really hit it."?
>
> And a full DataNode log is like the following, I now thought it's more like
> the blocks are moved to some where else, but the region server didn't know
> that.
> Is there any settings I should care about?
>
> 2011-05-12 01:23:55,317 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
> blk_2440422069461309270_3925117 file
> /hadoop/dfs/data/current/subdir19/subdir36/blk_2440422069461309270
> The data node just deleted a block, and then it tried to serve the block
> again as the following:
>
> 2011-05-12 01:32:37,444 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004,
> infoPort=50075, ipcPort=50020):Got exception while serving
> blk_2440422069461309270_3925117 to /10.0.2.12:
> java.io.IOException: Block blk_2440422069461309270_3925117 is not valid.
> at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
> at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
> at
>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> at java.lang.Thread.run(Thread.java:619)
>
> 2011-05-12 01:32:37,444 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004,
> infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: Block blk_2440422069461309270_3925117 is not valid.
> at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
> at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
> at
>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> at java.lang.Thread.run(Thread.java:619)
>
> And still tried to serve the block after 12 hours:
>
> 2011-05-12 12:14:40,257 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004,
> infoPort=50075, ipcPort=50020):Got exception while serving
> blk_2440422069461309270_3925117 to /10.0.2.12:
> java.io.IOException: Block blk_2440422069461309270_3925117 is not valid.
> at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
> at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
> at
>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> at java.lang.Thread.run(Thread.java:619)
>
> 2011-05-12 12:14:40,257 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004,
> infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: Block blk_2440422069461309270_3925117 is not valid.
> at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
> at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
> at
>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> at java.lang.Thread.run(Thread.java:619)
>
> If we went to the name node, we found the log like the following:
>
> 011-05-12 01:23:54,023 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.0.2.13:50010 to delete blk_-7967494102380684849_39242632011-05-12
> 01:23:54,023 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.0.2.14:50010 to delete *blk_2440422069461309270_3925117*
>  blk_-946722429317009208_3986596
>
> 2011-05-12 01:24:06,032 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.0.2.12:50010 to delete blk_232241225919439230_3925136
> blk_3644274217045813332_3954711 blk_-5566167900779836043_3946995
> blk_7698571737096469832_3922371 blk_6567693665925295269_3925143
> blk_-7967494102380684849_3924263 blk_761830835124422304_3954691
> blk_22871562856813954_3924967 blk_-108718719454734123_3924500
> blk_-5428140231738575121_3924529 *blk_2440422069461309270_3925117*
> blk_7672999872894289875_3946328
> blk_563378129567617626_3924835 blk_9167573765889664081_3924480
> blk_-6121243797820261553_3948685
>
> 2011-05-12 01:24:06,032 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.0.2.34:50010 to delete blk_8917364772641481342_3924301 *
> blk_2440422069461309270_3925117* blk_7231518316925385949_3924812
> blk_5636355579130421598_3953434 blk_-8159096223500224670_3923072
> blk_5958792757745258261_3948355
>
> Best wishes,
> Stanley Xu
>
>
>
> On Fri, May 13, 2011 at 1:06 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
>
> > The error message in the datanode log is pretty obvious about the
> > config if you really hit it. The error message you pasted from the DN
> > doesn't look complete either.
> >
> > J-D
> >
> > On Wed, May 11, 2011 at 7:30 AM, Stanley Xu <wenhao.xu@gmail.com> wrote:
> > > Dear all,
> > >
> > > We are using hadoop 0.20.2 with a couple of patches, and hbase 0.20.6,
> > when
> > > we are running a MapReduce job which contains a lots of random access
> to
> > a
> > > hbase table. We met a lot of logs like the following at the same time
> in
> > the
> > > region server and data node:
> > >
> > > For RegionServer:
> > > "INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block
> > > blk_7212216405058183301_3974453 from any node:  java.io.IOException: No
> > live
> > > nodes contain current block"
> > > "WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /
> > > 10.0.2.44:50010 for file
> > > /hbase/CookieTag/197333923/VisitStrength/151799904199528367 for block
> > > 7212216405058183301:java.io.IOException: Got error in response to
> > > OP_READ_BLOCK for file
> > > /hbase/CookieTag/197333923/VisitStrength/151799904199528367 for block
> > > 7212216405058183301" and
> > >
> > > For DataNode:
> > > "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > DatanodeRegistration(10.0.2.26:50010,
> > > storageID=DS-1332752738-192.168.11.99-50010-1285486780176,
> > infoPort=50075,
> > > ipcPort=50020):DataXceiver
> > > at
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> > > at
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)"
> > >
> > > We changed the dfs.datanode.max.xcievers parameters to 4096 in the
> > > hdfs-site.xml in both hadoop and hdfs configuration. But when we use a
> > > VisualVM to connect to a data node, we found there is less that 100
> > > threads(close to 100, we count 97 in a thread dump, I am guessing the
> > lost 3
> > > is finished or just started) DataXceiver thread. (Thread dump logs like
> > the
> > > following)
> > >
> > > "org.apache.hadoop.hdfs.server.datanode.DataXceiver@3546286a" - Thread
> > > t@2941003
> > >   java.lang.Thread.State: RUNNABLE
> > > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)"
> > >
> > > I guess our setting in the hdfs-site.xml didn't really work or be
> > activated.
> > > We have restarted the hadoop cluster by stop-dfs.sh and start-dfs.sh
> and
> > > also restarted the hbase as well.
> > > I am wondering if anyone could tell me how could I make sure the
> xceiver
> > > parameters works or anything I should do except restart the dfs and
> > hbase?
> > > Could I do any check in the web interface or anywhere else?
> > >
> > > Thanks in advance.
> > >
> > > Best wishes,
> > > Stanley Xu
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message