hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Slow region server recoveries
Date Fri, 19 Apr 2013 17:40:02 GMT
Can you show snippet from DN log which mentioned UNDER_RECOVERY ?

Here is the criteria for stale node checking to kick in (from
https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
):

+   * Check if the datanode is in stale state. Here if
+   * the namenode has not received heartbeat msg from a
+   * datanode for more than staleInterval (default value is
+   * {@link DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
+   * the datanode will be treated as stale node.


On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma <varun@pinterest.com> wrote:

> Is there a place to upload these logs ?
>
>
> On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma <varun@pinterest.com>
> wrote:
>
> > Hi Nicholas,
> >
> > Attached are the namenode, dn logs (of one of the healthy replicas of the
> > WAL block) and the rs logs which got stuch doing the log split. Action
> > begins at 2013-04-19 00:27*.
> >
> > Also, the rogue block is 5723958680970112840_174056. Its very interesting
> > to trace this guy through the HDFS logs (dn and nn).
> >
> > Btw, do you know what the UNDER_RECOVERY stage is for, in HDFS ? Also
> does
> > the stale node stuff kick in for that state ?
> >
> > Thanks
> > Varun
> >
> >
> > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon <nkeywal@gmail.com
> >wrote:
> >
> >> Thanks for the detailed scenario and analysis. I'm going to have a look.
> >> I can't access the logs (ec2-107-20-237-30.compute-1.amazonaws.com
> >> timeouts), could you please send them directly to me?
> >>
> >> Thanks,
> >>
> >> Nicolas
> >>
> >>
> >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma <varun@pinterest.com>
> >> wrote:
> >>
> >> > Hi Nicholas,
> >> >
> >> > Here is the failure scenario, I have dug up the logs.
> >> >
> >> > A machine fails and stops accepting/transmitting traffic. The HMaster
> >> > starts the distributed split for 13 tasks. There are 12 region
> servers.
> >> 12
> >> > tasks succeed but the 13th one takes a looong time.
> >> >
> >> > Zookeeper timeout is set to 30 seconds. Stale node timeout is 20
> >> seconds.
> >> > Both patches are there.
> >> >
> >> > a) Machine fails around 27:30
> >> > b) Master starts the split around 27:40 and submits the tasks. The one
> >> task
> >> > which fails seems to be the one which contains the WAL being currently
> >> > written to:
> >> >
> >> > 2013-04-19 00:27:44,325 INFO
> >> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog:
> >> > hdfs://
> >> >
> >> >
> >>
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> >> > ,
> >> > length=0
> >> >
> >> > Basically this region server picks up the task but finds the length of
> >> this
> >> > file to be 0 and drops. This happens again
> >> >
> >> > c) Finally another region server picks up the task but it ends up
> going
> >> to
> >> > the bad datanode which should not happen because of the stale node
> >> timeout)
> >> > Unfortunately it hits the 45 retries and a connect timeout of 20
> seconds
> >> > every time. This delays recovery significantly. Now I guess reducing #
> >> of
> >> > retries to 1 is one possibility.
> >> > But then the namenode logs are very interesting.
> >> >
> >> > d) Namenode seems to be in cyclic lease recovery loop until the node
> is
> >> > marked dead. There is this one last block which exhibits this.
> >> >
> >> > 2013-04-19 00:28:09,744 INFO BlockStateChange: BLOCK* blk_-*
> >> > 5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
> >> > primaryNodeIndex=1,
> >> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
> >> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> >> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]} recovery started,
> >> > primary=10.156.192.106:50010
> >> > 2013-04-19 00:28:09,744 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> >> > NameSystem.internalReleaseLease: File
> >> >
> >> >
> >>
> /hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> >> > has not been closed. Lease recovery is in progress. RecoveryId =
> 174413
> >> for
> >> > block blk_-5723958680970112840_174056{blockUCState=UNDER_RECOVERY,
> >> > primaryNodeIndex=1,
> >> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010|RBW],
> >> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> >> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}
> >> >
> >> > I see this over and over again in the logs until the datanode is
> marked
> >> > dead. It seems to be cycling through the replicas for this WAL block
> and
> >> > trying to add it to the recovery list. I looked at the code and it
> says:
> >> >
> >> >       // Cannot close file right now, since the last block requires
> >> > recovery.
> >> >       // This may potentially cause infinite loop in lease recovery
> >> >       // if there are no valid replicas on data-nodes.
> >> >       NameNode.stateChangeLog.warn(
> >> >                 "DIR* NameSystem.internalReleaseLease: " +
> >> >                 "File " + src + " has not been closed." +
> >> >                " Lease recovery is in progress. " +
> >> >                 "RecoveryId = " + blockRecoveryId + " for block " +
> >> > lastBlock);
> >> >       break;
> >> >
> >> > Eventually for this block, we get
> >> >
> >> > 2013-04-19 00:41:20,736 INFO
> >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> >> >
> >> >
> >>
> commitBlockSynchronization(lastblock=BP-696828882-10.168.7.226-1364886167971:blk_-
> >> > *5723958680970112840_174056*, newgenerationstamp=174413,
> >> > newlength=119148648, newtargets=[10.156.192.106:50010,
> >> 10.156.195.38:50010
> >> > ],
> >> > closeFile=true, deleteBlock=false)
> >> > 2013-04-19 00:41:20,736 ERROR
> >> > org.apache.hadoop.security.UserGroupInformation:
> >> PriviledgedActionException
> >> > as:hdfs (auth:SIMPLE) cause:java.io.IOException: Block
> >> >
> >>
> (=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
> >> > not found
> >> > 2013-04-19 00:41:20,736 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > handler 35 on 8020, call
> >> >
> >> >
> >>
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.commitBlockSynchronization
> >> > from 10.156.192.106:53271: error: java.io.IOException: Block
> >> >
> >>
> (=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
> >> > not found
> >> >
> >> > On the datanode side, i see a call for recover blocks - I see that a
> >> write
> >> > pipeline is there, which gets terminated with some socket timeouts...
> >> >
> >> > 00:28:11,471 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> NameNode
> >> > at ec2-107-20-237-30.compute-1.amazonaws.com/10.168.7.226:8020 calls
> >> >
> >> >
> >>
> recoverBlock(BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056,
> >> > targets=[10.156.194.94:50010, 10.156.192.106:50010,
> 10.156.195.38:50010
> >> ],
> >> > newGenerationStamp=174413)
> >> >
> >> > Not sure but this looks like a case where data could be lost   ?
> >> >
> >> > Varun
> >> >
> >> >
> >> > On Fri, Apr 19, 2013 at 12:38 AM, Nicolas Liochon <nkeywal@gmail.com>
> >> > wrote:
> >> >
> >> > > Hey Varun,
> >> > >
> >> > > Could you please share the logs and the configuration (hdfs / hbase
> >> > > settings + cluster description). What's the failure scenario?
> >> > > From an HDFS pov, HDFS 3703 does not change the dead node status.
> But
> >> > these
> >> > > node will be given the lowest priority when reading.
> >> > >
> >> > >
> >> > > Cheers,
> >> > >
> >> > > Nicolas
> >> > >
> >> > >
> >> > > On Fri, Apr 19, 2013 at 3:01 AM, Varun Sharma <varun@pinterest.com>
> >> > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > We are facing problems with really slow HBase region server
> >> recoveries
> >> > ~
> >> > > 20
> >> > > > minuted. Version is hbase 0.94.3 compiled with hadoop.profile=2.0.
> >> > > >
> >> > > > Hadoop version is CDH 4.2 with HDFS 3703 and HDFS 3912 patched
and
> >> > stale
> >> > > > node timeouts configured correctly. Time for dead node detection
> is
> >> > still
> >> > > > 10 minutes.
> >> > > >
> >> > > > We see that our region server is trying to read an HLog is stuck
> >> there
> >> > > for
> >> > > > a long time. Logs here:
> >> > > >
> >> > > > 2013-04-12 21:14:30,248 WARN org.apache.hadoop.hdfs.DFSClient:
> >> Failed
> >> > to
> >> > > > connect to /10.156.194.251:50010 for file
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> /hbase/feeds/fbe25f94ed4fa37fb0781e4a8efae142/home/1d102c5238874a5d82adbcc09bf06599
> >> > > > for block
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> BP-696828882-10.168.7.226-1364886167971:blk_-3289968688911401881_9428:java.net.SocketTimeoutException:
> >> > > > 15000 millis timeout while waiting for channel to be ready for
> read.
> >> > ch :
> >> > > > java.nio.channels.SocketChannel[connected local=/
> >> 10.156.192.173:52818
> >> > > > remote=/
> >> > > > 10.156.194.251:50010]
> >> > > >
> >> > > > I would think that HDFS 3703 would make the server fail fast
and
> go
> >> to
> >> > > the
> >> > > > third datanode. Currently, the recovery seems way too slow for
> >> > production
> >> > > > usage...
> >> > > >
> >> > > > Varun
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message