hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Liochon <nkey...@gmail.com>
Subject Re: Slow region server recoveries
Date Mon, 22 Apr 2013 07:51:58 GMT
The recovery starts at 27:40 (master log: 00:27:40,011), so before the
datanode is known as stale.
But the first attempt is cancelled, and a new one start at 28:10, before
being cancelled again at 28:35 (that's HBASE-6738).
These two attempts should see the datanode as stale. It seems it's not the
case. I don't know why it's not reordered. A possibility is the stale ratio
is not configured, or the other nodes were stale as well. Or HDFS doesn't
see the node as stale.

In any case, the story would be different with HBASE-8276 (i.e. HBASE-6738
backport to 94.7) + HBASE-6435.
If you have a reproduction scenario, it's worth trying.

For HBASE-8389, yes, we need this to work...

Nicolas
<https://issues.apache.org/jira/browse/HBASE-8389>




On Sun, Apr 21, 2013 at 7:38 PM, Varun Sharma <varun@pinterest.com> wrote:

> Hi Ted, Nicholas,
>
> Thanks for the comments. We found some issues with lease recovery and I
> patched HBASE 8354 to ensure we don't see data loss. Could you please look
> at HDFS 4721 and HBASE 8389 ?
>
> Thanks
> Varun
>
>
> On Sat, Apr 20, 2013 at 10:52 AM, Varun Sharma <varun@pinterest.com>
> wrote:
>
> > The important thing to note is the block for this rogue WAL is
> > UNDER_RECOVERY state. I have repeatedly asked HDFS dev if the stale node
> > thing kicks in correctly for UNDER_RECOVERY blocks but failed.
> >
> >
> > On Sat, Apr 20, 2013 at 10:47 AM, Varun Sharma <varun@pinterest.com
> >wrote:
> >
> >> Hi Nicholas,
> >>
> >> Regarding the following, I think this is not a recovery - the file below
> >> is an HFIle and is being accessed on a get request. On this cluster, I
> >> don't have block locality. I see these exceptions for a while and then
> they
> >> are gone, which means the stale node thing kicks in.
> >>
> >> 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
> >> connect to /10.156.194.94:50010 for file
> >> /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/
> >> 02f639bb43944d4ba9abcf58287831c0
> >> for block
> >>
> >> This is the real bummer. The stale datanode is 1st even 90 seconds
> >> afterwards.
> >>
> >> *2013-04-19 00:28:35*,777 WARN
> >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> >> hdfs://
> >>
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141failed,
> returning error
> >> java.io.IOException: Cannot obtain block length for
> >>
> LocatedBlock{BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056;
> >> getBlockSize()=0; corrupt=false; offset=0; locs=*[10.156.194.94:50010,
> >> 10.156.192.106:50010, 10.156.195.38:50010]}*
> >> >---at
> >>
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:238)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:182)
> >> >---at
> >> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
> >> >---at
> >> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:117)
> >> >---at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1080)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:245)
> >> >---at
> >>
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:78)
> >> >---at
> >>
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1787)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
> >> >---at
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1707)
> >> >---at
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
> >> >---at
> >> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:717)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:821)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:734)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:348)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:111)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:195)
> >> >---at
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:163)
> >> >---at java.lang.Thread.run(Thread.java:662)
> >>
> >>
> >>
> >> On Sat, Apr 20, 2013 at 1:16 AM, Nicolas Liochon <nkeywal@gmail.com
> >wrote:
> >>
> >>> Hi,
> >>>
> >>> I looked at it again with a fresh eye. As Varun was saying, the root
> >>> cause
> >>> is the wrong order of the block locations.
> >>>
> >>> The root cause of the root cause is actually simple: HBASE started the
> >>> recovery while the node was not yet stale from an HDFS pov.
> >>>
> >>> Varun mentioned this timing:
> >>> Lost Beat: 27:30
> >>> Became stale: 27:50 - * this is a guess and reverse engineered (stale
> >>> timeout 20 seconds)
> >>> Became dead: 37:51
> >>>
> >>> But the  recovery started at 27:13 (15 seconds before we have this log
> >>> line)
> >>> 2013-04-19 00:27:28,432 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> to
> >>> connect to /10.156.194.94:50010 for file
> >>>
> >>>
> /hbase/feeds/1479495ad2a02dceb41f093ebc29fe4f/home/02f639bb43944d4ba9abcf58287831c0
> >>> for block
> >>>
> >>>
> BP-696828882-10.168.7.226-1364886167971:blk_-5977178030490858298_99853:java.net.SocketTimeoutException:
> >>> 15000 millis timeout while waiting for channel to be ready for connect.
> >>> ch
> >>> : java.nio.channels.SocketChannel[connection-pending remote=/
> >>> 10.156.194.94:50010]
> >>>
> >>> So when we took the blocks from the NN, the datanode was not stale, so
> >>> you
> >>> have the wrong (random) order.
> >>>
> >>> ZooKeeper can expire a session before the timeout. I don't what why it
> >>> does
> >>> this in this case, but I don't consider it as a ZK bug: if ZK knows
> that
> >>> a
> >>> node is dead, it's its role to expire the session. There is something
> >>> more
> >>> fishy: we started the recovery while the datanode was still responding
> to
> >>> heartbeat. I don't know why. Maybe the OS has been able to kill 15 the
> RS
> >>> before vanishing away.
> >>>
> >>> Anyway, we then have an exception when we try to connect, because the
> RS
> >>> does not have a TCP connection to this datanode. And this is retried
> many
> >>> times.
> >>>
> >>> You would not have this with trunk, because HBASE-6435 reorders the
> >>> blocks
> >>> inside the client, using an information not available to the NN,
> >>> excluding
> >>> the datanode of the region server under recovery.
> >>>
> >>> Some conclusions:
> >>>  - we should likely backport hbase-6435 to 0.94.
> >>>  - I will revive HDFS-3706 and HDFS-3705 (the non hacky way to get
> >>> hbase-6435).
> >>>  - There are some stuff that could be better in HDFS. I will see.
> >>>  - I'm worried by the SocketTimeoutException. We should get
> NoRouteToHost
> >>> at a moment, and we don't. That's also why it takes ages. I think it's
> an
> >>> AWS thing, but it brings to issue: it's slow, and, in HBase, you don't
> >>> know
> >>> if the operation could have been executed or not, so it adds complexity
> >>> to
> >>> some scenarios. If someone with enough network and AWS knowledge could
> >>> clarify this point it would be great.
> >>>
> >>>  Cheers,
> >>>
> >>>  Nicolas
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Apr 19, 2013 at 10:10 PM, Varun Sharma <varun@pinterest.com>
> >>> wrote:
> >>>
> >>> > This is 0.94.3 hbase...
> >>> >
> >>> >
> >>> > On Fri, Apr 19, 2013 at 1:09 PM, Varun Sharma <varun@pinterest.com>
> >>> wrote:
> >>> >
> >>> > > Hi Ted,
> >>> > >
> >>> > > I had a long offline discussion with nicholas on this. Looks like
> the
> >>> > last
> >>> > > block which was still being written too, took an enormous time
to
> >>> > recover.
> >>> > > Here's what happened.
> >>> > > a) Master split tasks and region servers process them
> >>> > > b) Region server tries to recover lease for each WAL log - most
> >>> cases are
> >>> > > noop since they are already rolled over/finalized
> >>> > > c) The last file lease recovery takes some time since the crashing
> >>> server
> >>> > > was writing to it and had a lease on it - but basically we have
the
> >>> > lease 1
> >>> > > minute after the server was lost
> >>> > > d) Now we start the recovery for this but we end up hitting the
> stale
> >>> > data
> >>> > > node which is puzzling.
> >>> > >
> >>> > > It seems that we did not hit the stale datanode when we were trying
> >>> to
> >>> > > recover the finalized WAL blocks with trivial lease recovery.
> >>> However,
> >>> > for
> >>> > > the final block, we hit the stale datanode. Any clue why this
might
> >>> be
> >>> > > happening ?
> >>> > >
> >>> > > Varun
> >>> > >
> >>> > >
> >>> > > On Fri, Apr 19, 2013 at 10:40 AM, Ted Yu <yuzhihong@gmail.com>
> >>> wrote:
> >>> > >
> >>> > >> Can you show snippet from DN log which mentioned UNDER_RECOVERY
?
> >>> > >>
> >>> > >> Here is the criteria for stale node checking to kick in (from
> >>> > >>
> >>> > >>
> >>> >
> >>>
> https://issues.apache.org/jira/secure/attachment/12544897/HDFS-3703-trunk-read-only.patch
> >>> > >> ):
> >>> > >>
> >>> > >> +   * Check if the datanode is in stale state. Here if
> >>> > >> +   * the namenode has not received heartbeat msg from a
> >>> > >> +   * datanode for more than staleInterval (default value
is
> >>> > >> +   * {@link
> >>> > >>
> DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_MILLI_DEFAULT}),
> >>> > >> +   * the datanode will be treated as stale node.
> >>> > >>
> >>> > >>
> >>> > >> On Fri, Apr 19, 2013 at 10:28 AM, Varun Sharma <
> varun@pinterest.com
> >>> >
> >>> > >> wrote:
> >>> > >>
> >>> > >> > Is there a place to upload these logs ?
> >>> > >> >
> >>> > >> >
> >>> > >> > On Fri, Apr 19, 2013 at 10:25 AM, Varun Sharma <
> >>> varun@pinterest.com>
> >>> > >> > wrote:
> >>> > >> >
> >>> > >> > > Hi Nicholas,
> >>> > >> > >
> >>> > >> > > Attached are the namenode, dn logs (of one of the
healthy
> >>> replicas
> >>> > of
> >>> > >> the
> >>> > >> > > WAL block) and the rs logs which got stuch doing
the log
> split.
> >>> > Action
> >>> > >> > > begins at 2013-04-19 00:27*.
> >>> > >> > >
> >>> > >> > > Also, the rogue block is 5723958680970112840_174056.
Its very
> >>> > >> interesting
> >>> > >> > > to trace this guy through the HDFS logs (dn and
nn).
> >>> > >> > >
> >>> > >> > > Btw, do you know what the UNDER_RECOVERY stage is
for, in
> HDFS ?
> >>> > Also
> >>> > >> > does
> >>> > >> > > the stale node stuff kick in for that state ?
> >>> > >> > >
> >>> > >> > > Thanks
> >>> > >> > > Varun
> >>> > >> > >
> >>> > >> > >
> >>> > >> > > On Fri, Apr 19, 2013 at 4:00 AM, Nicolas Liochon
<
> >>> nkeywal@gmail.com
> >>> > >> > >wrote:
> >>> > >> > >
> >>> > >> > >> Thanks for the detailed scenario and analysis.
I'm going to
> >>> have a
> >>> > >> look.
> >>> > >> > >> I can't access the logs (
> >>> ec2-107-20-237-30.compute-1.amazonaws.com
> >>> > >> > >> timeouts), could you please send them directly
to me?
> >>> > >> > >>
> >>> > >> > >> Thanks,
> >>> > >> > >>
> >>> > >> > >> Nicolas
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> On Fri, Apr 19, 2013 at 12:46 PM, Varun Sharma
<
> >>> > varun@pinterest.com>
> >>> > >> > >> wrote:
> >>> > >> > >>
> >>> > >> > >> > Hi Nicholas,
> >>> > >> > >> >
> >>> > >> > >> > Here is the failure scenario, I have dug
up the logs.
> >>> > >> > >> >
> >>> > >> > >> > A machine fails and stops accepting/transmitting
traffic.
> The
> >>> > >> HMaster
> >>> > >> > >> > starts the distributed split for 13 tasks.
There are 12
> >>> region
> >>> > >> > servers.
> >>> > >> > >> 12
> >>> > >> > >> > tasks succeed but the 13th one takes a
looong time.
> >>> > >> > >> >
> >>> > >> > >> > Zookeeper timeout is set to 30 seconds.
Stale node timeout
> >>> is 20
> >>> > >> > >> seconds.
> >>> > >> > >> > Both patches are there.
> >>> > >> > >> >
> >>> > >> > >> > a) Machine fails around 27:30
> >>> > >> > >> > b) Master starts the split around 27:40
and submits the
> >>> tasks.
> >>> > The
> >>> > >> one
> >>> > >> > >> task
> >>> > >> > >> > which fails seems to be the one which contains
the WAL
> being
> >>> > >> currently
> >>> > >> > >> > written to:
> >>> > >> > >> >
> >>> > >> > >> > 2013-04-19 00:27:44,325 INFO
> >>> > >> > >> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
> >>> Splitting
> >>> > >> hlog:
> >>> > >> > >> > hdfs://
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> ec2-107-20-237-30.compute-1.amazonaws.com/hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> >>> > >> > >> > ,
> >>> > >> > >> > length=0
> >>> > >> > >> >
> >>> > >> > >> > Basically this region server picks up the
task but finds
> the
> >>> > >> length of
> >>> > >> > >> this
> >>> > >> > >> > file to be 0 and drops. This happens again
> >>> > >> > >> >
> >>> > >> > >> > c) Finally another region server picks
up the task but it
> >>> ends up
> >>> > >> > going
> >>> > >> > >> to
> >>> > >> > >> > the bad datanode which should not happen
because of the
> stale
> >>> > node
> >>> > >> > >> timeout)
> >>> > >> > >> > Unfortunately it hits the 45 retries and
a connect timeout
> >>> of 20
> >>> > >> > seconds
> >>> > >> > >> > every time. This delays recovery significantly.
Now I guess
> >>> > >> reducing #
> >>> > >> > >> of
> >>> > >> > >> > retries to 1 is one possibility.
> >>> > >> > >> > But then the namenode logs are very interesting.
> >>> > >> > >> >
> >>> > >> > >> > d) Namenode seems to be in cyclic lease
recovery loop until
> >>> the
> >>> > >> node
> >>> > >> > is
> >>> > >> > >> > marked dead. There is this one last block
which exhibits
> >>> this.
> >>> > >> > >> >
> >>> > >> > >> > 2013-04-19 00:28:09,744 INFO BlockStateChange:
BLOCK*
> blk_-*
> >>> > >> > >> > 5723958680970112840_174056*{blockUCState=UNDER_RECOVERY,
> >>> > >> > >> > primaryNodeIndex=1,
> >>> > >> > >> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010
> |RBW],
> >>> > >> > >> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> >>> > >> > >> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}
> recovery
> >>> > >> started,
> >>> > >> > >> > primary=10.156.192.106:50010
> >>> > >> > >> > 2013-04-19 00:28:09,744 WARN
> >>> org.apache.hadoop.hdfs.StateChange:
> >>> > >> DIR*
> >>> > >> > >> > NameSystem.internalReleaseLease: File
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> /hbase/.logs/ip-10-156-194-94.ec2.internal,60020,1366323217601-splitting/ip-10-156-194-94.ec2.internal%2C60020%2C1366323217601.1366331156141
> >>> > >> > >> > has not been closed. Lease recovery is
in progress.
> >>> RecoveryId =
> >>> > >> > 174413
> >>> > >> > >> for
> >>> > >> > >> > block
> >>> > blk_-5723958680970112840_174056{blockUCState=UNDER_RECOVERY,
> >>> > >> > >> > primaryNodeIndex=1,
> >>> > >> > >> > replicas=[ReplicaUnderConstruction[10.156.194.94:50010
> |RBW],
> >>> > >> > >> > ReplicaUnderConstruction[10.156.192.106:50010|RBW],
> >>> > >> > >> > ReplicaUnderConstruction[10.156.195.38:50010|RBW]]}
> >>> > >> > >> >
> >>> > >> > >> > I see this over and over again in the logs
until the
> >>> datanode is
> >>> > >> > marked
> >>> > >> > >> > dead. It seems to be cycling through the
replicas for this
> >>> WAL
> >>> > >> block
> >>> > >> > and
> >>> > >> > >> > trying to add it to the recovery list.
I looked at the code
> >>> and
> >>> > it
> >>> > >> > says:
> >>> > >> > >> >
> >>> > >> > >> >       // Cannot close file right now, since
the last block
> >>> > requires
> >>> > >> > >> > recovery.
> >>> > >> > >> >       // This may potentially cause infinite
loop in lease
> >>> > recovery
> >>> > >> > >> >       // if there are no valid replicas
on data-nodes.
> >>> > >> > >> >       NameNode.stateChangeLog.warn(
> >>> > >> > >> >                 "DIR* NameSystem.internalReleaseLease:
" +
> >>> > >> > >> >                 "File " + src + " has not
been closed." +
> >>> > >> > >> >                " Lease recovery is in progress.
" +
> >>> > >> > >> >                 "RecoveryId = " + blockRecoveryId
+ " for
> >>> block
> >>> > " +
> >>> > >> > >> > lastBlock);
> >>> > >> > >> >       break;
> >>> > >> > >> >
> >>> > >> > >> > Eventually for this block, we get
> >>> > >> > >> >
> >>> > >> > >> > 2013-04-19 00:41:20,736 INFO
> >>> > >> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> commitBlockSynchronization(lastblock=BP-696828882-10.168.7.226-1364886167971:blk_-
> >>> > >> > >> > *5723958680970112840_174056*, newgenerationstamp=174413,
> >>> > >> > >> > newlength=119148648, newtargets=[10.156.192.106:50010,
> >>> > >> > >> 10.156.195.38:50010
> >>> > >> > >> > ],
> >>> > >> > >> > closeFile=true, deleteBlock=false)
> >>> > >> > >> > 2013-04-19 00:41:20,736 ERROR
> >>> > >> > >> > org.apache.hadoop.security.UserGroupInformation:
> >>> > >> > >> PriviledgedActionException
> >>> > >> > >> > as:hdfs (auth:SIMPLE) cause:java.io.IOException:
Block
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> (=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
> >>> > >> > >> > not found
> >>> > >> > >> > 2013-04-19 00:41:20,736 INFO org.apache.hadoop.ipc.Server:
> >>> IPC
> >>> > >> Server
> >>> > >> > >> > handler 35 on 8020, call
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.commitBlockSynchronization
> >>> > >> > >> > from 10.156.192.106:53271: error: java.io.IOException:
> Block
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> (=BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056)
> >>> > >> > >> > not found
> >>> > >> > >> >
> >>> > >> > >> > On the datanode side, i see a call for
recover blocks - I
> see
> >>> > that
> >>> > >> a
> >>> > >> > >> write
> >>> > >> > >> > pipeline is there, which gets terminated
with some socket
> >>> > >> timeouts...
> >>> > >> > >> >
> >>> > >> > >> > 00:28:11,471 INFO
> >>> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> >>> > >> > >> NameNode
> >>> > >> > >> > at
> >>> > ec2-107-20-237-30.compute-1.amazonaws.com/10.168.7.226:8020calls
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> recoverBlock(BP-696828882-10.168.7.226-1364886167971:blk_-5723958680970112840_174056,
> >>> > >> > >> > targets=[10.156.194.94:50010, 10.156.192.106:50010,
> >>> > >> > 10.156.195.38:50010
> >>> > >> > >> ],
> >>> > >> > >> > newGenerationStamp=174413)
> >>> > >> > >> >
> >>> > >> > >> > Not sure but this looks like a case where
data could be
> lost
> >>>   ?
> >>> > >> > >> >
> >>> > >> > >> > Varun
> >>> > >> > >> >
> >>> > >> > >> >
> >>> > >> > >> > On Fri, Apr 19, 2013 at 12:38 AM, Nicolas
Liochon <
> >>> > >> nkeywal@gmail.com>
> >>> > >> > >> > wrote:
> >>> > >> > >> >
> >>> > >> > >> > > Hey Varun,
> >>> > >> > >> > >
> >>> > >> > >> > > Could you please share the logs and
the configuration
> >>> (hdfs /
> >>> > >> hbase
> >>> > >> > >> > > settings + cluster description). What's
the failure
> >>> scenario?
> >>> > >> > >> > > From an HDFS pov, HDFS 3703 does not
change the dead node
> >>> > status.
> >>> > >> > But
> >>> > >> > >> > these
> >>> > >> > >> > > node will be given the lowest priority
when reading.
> >>> > >> > >> > >
> >>> > >> > >> > >
> >>> > >> > >> > > Cheers,
> >>> > >> > >> > >
> >>> > >> > >> > > Nicolas
> >>> > >> > >> > >
> >>> > >> > >> > >
> >>> > >> > >> > > On Fri, Apr 19, 2013 at 3:01 AM, Varun
Sharma <
> >>> > >> varun@pinterest.com>
> >>> > >> > >> > wrote:
> >>> > >> > >> > >
> >>> > >> > >> > > > Hi,
> >>> > >> > >> > > >
> >>> > >> > >> > > > We are facing problems with really
slow HBase region
> >>> server
> >>> > >> > >> recoveries
> >>> > >> > >> > ~
> >>> > >> > >> > > 20
> >>> > >> > >> > > > minuted. Version is hbase 0.94.3
compiled with
> >>> > >> hadoop.profile=2.0.
> >>> > >> > >> > > >
> >>> > >> > >> > > > Hadoop version is CDH 4.2 with
HDFS 3703 and HDFS 3912
> >>> > patched
> >>> > >> and
> >>> > >> > >> > stale
> >>> > >> > >> > > > node timeouts configured correctly.
Time for dead node
> >>> > >> detection
> >>> > >> > is
> >>> > >> > >> > still
> >>> > >> > >> > > > 10 minutes.
> >>> > >> > >> > > >
> >>> > >> > >> > > > We see that our region server
is trying to read an HLog
> >>> is
> >>> > >> stuck
> >>> > >> > >> there
> >>> > >> > >> > > for
> >>> > >> > >> > > > a long time. Logs here:
> >>> > >> > >> > > >
> >>> > >> > >> > > > 2013-04-12 21:14:30,248 WARN
> >>> > org.apache.hadoop.hdfs.DFSClient:
> >>> > >> > >> Failed
> >>> > >> > >> > to
> >>> > >> > >> > > > connect to /10.156.194.251:50010
for file
> >>> > >> > >> > > >
> >>> > >> > >> > > >
> >>> > >> > >> > >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> /hbase/feeds/fbe25f94ed4fa37fb0781e4a8efae142/home/1d102c5238874a5d82adbcc09bf06599
> >>> > >> > >> > > > for block
> >>> > >> > >> > > >
> >>> > >> > >> > > >
> >>> > >> > >> > >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> BP-696828882-10.168.7.226-1364886167971:blk_-3289968688911401881_9428:java.net.SocketTimeoutException:
> >>> > >> > >> > > > 15000 millis timeout while waiting
for channel to be
> >>> ready
> >>> > for
> >>> > >> > read.
> >>> > >> > >> > ch :
> >>> > >> > >> > > > java.nio.channels.SocketChannel[connected
local=/
> >>> > >> > >> 10.156.192.173:52818
> >>> > >> > >> > > > remote=/
> >>> > >> > >> > > > 10.156.194.251:50010]
> >>> > >> > >> > > >
> >>> > >> > >> > > > I would think that HDFS 3703
would make the server fail
> >>> fast
> >>> > >> and
> >>> > >> > go
> >>> > >> > >> to
> >>> > >> > >> > > the
> >>> > >> > >> > > > third datanode. Currently, the
recovery seems way too
> >>> slow
> >>> > for
> >>> > >> > >> > production
> >>> > >> > >> > > > usage...
> >>> > >> > >> > > >
> >>> > >> > >> > > > Varun
> >>> > >> > >> > > >
> >>> > >> > >> > >
> >>> > >> > >> >
> >>> > >> > >>
> >>> > >> > >
> >>> > >> > >
> >>> > >> >
> >>> > >>
> >>> > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message