hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: region server outage leading to mapper timeout
Date Thu, 25 Nov 2010 01:00:36 GMT
I am attaching jstack I collected for the region servers which might have
problem (grid07, grid08 and grid11)

grid07 was doing minor compaction:

"regionserver/10.202.50.107:60020.compactor" daemon prio=10
tid=0x000000004d150800 nid=0x857 runnable [0x0000000043e6a000]
   java.lang.Thread.State: RUNNABLE
        at
org.apache.hadoop.io.compress.zlib.ZlibCompressor.deflateBytesDirect(Native
Method)
        at
org.apache.hadoop.io.compress.zlib.ZlibCompressor.compress(ZlibCompressor.java:315)
        - locked <0x00002aaac06338d8> (a
org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor)
        at
org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76)
        at
org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71)
        at
org.apache.hadoop.hbase.io.hfile.Compression$FinishOnFlushCompressionStream.write(Compression.java:62)
        at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
        - locked <0x00002aaabb870410> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        - locked <0x00002aaabb871468> (a java.io.DataOutputStream)
        at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:522)
        at
org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:481)
        at
org.apache.hadoop.hbase.regionserver.MinorCompactingStoreScanner.next(MinorCompactingStoreScanner.java:96)
        at
org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:922)
        at
org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:765)
        - locked <0x00002aaac2f09d28> (a java.lang.Object)
        at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:833)
        at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:786)
        at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:93)

On Wed, Nov 24, 2010 at 4:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Hi,
> We use 0.20.6 to process large amount of data:
> FILE_BYTES_WRITTEN 132,953,083,977
> Map output bytes     300,214,289,928
>
> In two of our mappers which timed out I saw:
>
> 2010-11-24 23:16:51,561 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,
host=us01-ciqps1-name01.carrieriq.com:2181 sessionTimeout=60000 watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher@f855562
>
> 2010-11-24 23:16:51,563 INFO org.apache.zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset
is false
> 2010-11-24 23:16:51,585 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to
server us01-ciqps1-name01.carrieriq.com/10.202.50.100:2181
>
> 2010-11-24 23:16:51,593 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected
local=/10.202.50.101:63047 remote=us01-ciqps1-name01.carrieriq.com/10.202.50.100:2181]
>
> 2010-11-24 23:16:51,596 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
> 2010-11-24 23:16:55,127 INFO com.carrieriq.m2m.platform.mmp2.input.StripedHBaseTableInputFormat:
Starting scan of table 'GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555'
>
> As of this moment, GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555
> has been deleted because of failure handling in our flow.
>
> Our monitoring script started noticing the following at 2010-11-24 23-39-50
> (GMT):
>
>  HBase Shell; enter 'help<RETURN>' for list of supported commands.
> Version: 0.20.6, r965666, Mon Jul 19 15:48:07 PDT 2010
> get
> 'GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555','7B7C0D0BC834B8BD53422AFA94023223'
> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server us01-ciqps1-grid12.carrieriq.com:60020 for
> region
> GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555,7B7C0D0BC834B8BD53422AFA94023223,1290638846310,
> row '7B7C0D0BC834B8BD53422AFA94023223', but failed after 7 attempts.
> Exceptions:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after
> attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after
> attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after
> attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after
> attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after
> attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after
> attempts=1
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy to us01-ciqps1-grid12.carrieriq.com/10.202.50.112:60020 after
> attempts=1
>
> I have collected region server log (where I found occurrences of
> GRID-GRIDSQL-STAGING-THREEGPPSPEECHCALLS-1290634808555) and master log
> I can send the zipped tar ball to you upon request.
>
> Have a nice holiday.
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message