hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoff Hendrey" <ghend...@decarta.com>
Subject RE: Put slows down and eventually blocks
Date Thu, 20 May 2010 01:26:17 GMT
Answers prefixed by "geoff:": 
>What do the MR logs say?  Do they point to an explicit row or region for failed task?


Geoff: MR syslog shows nothing abnormal that I can see. I am running just a single reducer,
and autoflushing puts. I have pasted the reducer logs at the end of my email. As context for
the information that I've pasted - I instantiate the HTable in the setup method, can I close
it in cleanup.

>What do the MR logs say?  Do they point to an explicit row or region for failed task?
 Can you trace life of that region by grepping it in master logs?  

Geoff: I was writing the row key to stdout so I could collect debugging. So I know the row
key, and I grepped all hadoop and hbase logs on all three servers that participate in the
little Hbase cluster. Nothing in the logs that greps out for the particular row key that the
put hangs on.

>At time that the failing task runs, grep the regionservers logs for that time.  What is
going on? 

Geoff: Ahhh, glad you asked. I see the Put connection crapping out (of course, because the
task is killed), but ten minutes earlier a global memstore fush started, and from then until
the Put is killed it's wall to wall compaction and memstore flushing. It seems clear that
this caused the server to block the Put, and the compaction and flushing went on for 10 minutes,
at which time the task tracker killed the job, and the flushing and compations continued for
a total of about 12 minutes. This also explains why the tasks are killed so reliably at the
same time each time I try to run this job - that is, because each time the reducer has written
a certain amount of data, this flushing happens. So, should I try decreasing the size of the
global memstore? It seems like I have to trade performance against these long pauses; the
bigger the memstore, the longer the pauses. Does that make sense? Any recommendations for
global memstore size on a machine with 32 GB ram, 64-bit JVM, and 8 GB ram per Hbase process?

Thanks -- we're getting somewhere. (MR syslog follows)

-geoff




2010-05-19 01:00:52,720 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=SHUFFLE, sessionId=
2010-05-19 01:00:52,894 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=361909440,
MaxSingleShuffleLimit=90477360
2010-05-19 01:00:52,901 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Thread started: Thread for merging on-disk files
2010-05-19 01:00:52,902 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Thread waiting: Thread for merging on-disk files
2010-05-19 01:00:52,903 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Thread started: Thread for merging in memory files
2010-05-19 01:00:52,904 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Need another 4 map output(s) where 0 is already in progress
2010-05-19 01:00:52,904 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2010-05-19 01:00:52,904 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Thread started: Thread for polling Map Completion Events
2010-05-19 01:00:52,911 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring obsolete output
of KILLED map-task: 'attempt_201005141621_0203_m_000001_1'
2010-05-19 01:00:52,911 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring obsolete output
of KILLED map-task: 'attempt_201005141621_0203_m_000002_1'
2010-05-19 01:00:52,911 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring obsolete output
of KILLED map-task: 'attempt_201005141621_0203_m_000003_1'
2010-05-19 01:00:52,911 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3:
Got 4 new map-outputs
2010-05-19 01:00:57,906 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Scheduled 4 outputs (0 slow hosts and0 dup hosts)
2010-05-19 01:00:57,960 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201005141621_0203_m_000000_0,
compressed len: 216313347, decompressed len: 216313343
2010-05-19 01:00:57,960 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201005141621_0203_m_000001_0,
compressed len: 207291891, decompressed len: 207291887
2010-05-19 01:00:57,960 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 216313343 bytes
(216313347 raw bytes) into Local-FS from attempt_201005141621_0203_m_000000_0
2010-05-19 01:00:57,960 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 207291887 bytes
(207291891 raw bytes) into Local-FS from attempt_201005141621_0203_m_000001_0
2010-05-19 01:00:57,961 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201005141621_0203_m_000003_0,
compressed len: 222666366, decompressed len: 222666362
2010-05-19 01:00:57,961 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 222666362 bytes
(222666366 raw bytes) into Local-FS from attempt_201005141621_0203_m_000003_0
2010-05-19 01:00:57,996 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201005141621_0203_m_000002_0,
compressed len: 196007332, decompressed len: 196007328
2010-05-19 01:00:57,997 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 196007328 bytes
(196007332 raw bytes) into Local-FS from attempt_201005141621_0203_m_000002_0
2010-05-19 01:01:05,103 INFO org.apache.hadoop.mapred.ReduceTask: Read 207291891 bytes from
map-output for attempt_201005141621_0203_m_000001_0
2010-05-19 01:01:05,106 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Thread waiting: Thread for merging on-disk files
2010-05-19 01:01:05,155 INFO org.apache.hadoop.mapred.ReduceTask: Read 222666366 bytes from
map-output for attempt_201005141621_0203_m_000003_0
2010-05-19 01:01:05,161 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Thread waiting: Thread for merging on-disk files
2010-05-19 01:01:05,307 INFO org.apache.hadoop.mapred.ReduceTask: Read 216313347 bytes from
map-output for attempt_201005141621_0203_m_000000_0
2010-05-19 01:01:05,308 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Thread waiting: Thread for merging on-disk files
2010-05-19 01:01:05,379 INFO org.apache.hadoop.mapred.ReduceTask: Read 196007332 bytes from
map-output for attempt_201005141621_0203_m_000002_0
2010-05-19 01:01:05,380 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3
Thread waiting: Thread for merging on-disk files
2010-05-19 01:01:05,949 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting
2010-05-19 01:01:05,950 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined.
2010-05-19 01:01:05,951 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager
2010-05-19 01:01:05,951 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge
complete: 4 files left.
2010-05-19 01:01:05,951 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete:
0 files left.
2010-05-19 01:01:05,956 INFO org.apache.hadoop.mapred.ReduceTask: Merging 4 files, 842278936
bytes from disk
2010-05-19 01:01:05,957 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes
from memory into reduce
2010-05-19 01:01:05,960 INFO org.apache.hadoop.mapred.Merger: Merging 4 sorted segments
2010-05-19 01:01:05,972 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
with 4 segments left of total size: 842278920 bytes
2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.2-888565,
built on 12/08/2009 21:51 GMT
2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ddbs-host7-dm0
2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.6.0_17
2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Sun
Microsystems Inc.
2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_17/jre
2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.class.path=/home/dmc2/hadoop/bin/../conf:/usr/java/default/lib/tools.jar:/home/dmc2/hadoop/bin/..:/home/dmc2/hadoop/bin/../hadoop-0.20.2-core.jar:/home/dmc2/hadoop/bin/../lib/commons-cli-1.2.jar:/home/dmc2/hadoop/bin/../lib/commons-codec-1.3.jar:/home/dmc2/hadoop/bin/../lib/commons-el-1.0.jar:/home/dmc2/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/dmc2/hadoop/bin/../lib/commons-logging-1.0.4.jar:/home/dmc2/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/dmc2/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/dmc2/hadoop/bin/../lib/core-3.1.1.jar:/home/dmc2/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/dmc2/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/dmc2/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/dmc2/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/dmc2/hadoop/bin/../lib/jetty-6.1.14.jar:/home/dmc2/hadoop/bin/../lib/jetty-util-6.1.14.jar:/home/dmc2/hadoop/bin/../lib/junit-3.8.1.jar:/home/dmc2/hadoop/bin/../lib/kfs-0.2.2.jar:/home/dmc2/hadoop/bin/../lib/log4j-1.2.15.jar:/home/dmc2/hadoop/bin/../lib/mockito-all-1.8.0.jar:/home/dmc2/hadoop/bin/../lib/oro-2.0.8.jar:/home/dmc2/hadoop/bin/../lib/servlet-api-2.5-6.1.14.jar:/home/dmc2/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/dmc2/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/dmc2/hadoop/bin/../lib/xmlenc-0.52.jar:/home/dmc2/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/dmc2/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar::/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xsltc.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jh.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/opencsv-1.8.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xalan.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jsr311-api-1.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-core.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/Echo.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xsdlib.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/saaj-api.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/ant-launcher.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/junit.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/Echo_FileTransfer.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jta-spec1_0_1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-tools.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-io-1.4.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/servletapi-2.3.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxb-impl.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-queries-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/json.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-spellchecker-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/relaxngDatatype.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xmlsec.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/saaj-impl.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-logging-1.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xws-security.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-codec-1.3.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-cli-1.2.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/workflow.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-core-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-streaming.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxb-libs.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-index.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/mail.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxb-xjc.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jcommon-0.9.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/log4j-1.2.16.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/ant.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xmldsig.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/decarta-util.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/marsh.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoopxmlstreamer.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xercesImpl.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jax-qname.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/sax.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxp-api.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-ant.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-collections-3.2.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jxl.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-snowball-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-dbcp-1.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/dom4j-full.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-analyzers-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-pool-1.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/EchoServer.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-httpclient-3.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxb-api.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/gnujaxp.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/echopoint-0.9.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/zookeeper-3.2.2.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/signpost-core-1.1-SNAPSHOT.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/mysql-connector-java-5.0.8-bin.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/shades-0.0.5.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-fileupload-1.2.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/dds-head.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/namespace.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jersey-bundle-1.1.5.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xml4j.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/cobertura.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hbase-0.20.3.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jfreechart-0.9.16.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hsqldb-2.0.0-rc8.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/classes:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars:/hive4/mapred/local/taskTracker/jobcache/job_201005141621_0203/attempt_201005141621_0203_r_000000_3/work
2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/dmc2/hadoop/bin/../lib/native/Linux-i386-32:/hive4/mapred/local/taskTracker/jobcache/job_201005141621_0203/attempt_201005141621_0203_r_000000_3/work
2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/hive4/mapred/local/taskTracker/jobcache/job_201005141621_0203/attempt_201005141621_0203_r_000000_3/work/tmp
2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=i386
2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=2.6.18-128.el5PAE
2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=dmc2
2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/dmc2
2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/disk4/hive4/mapred/local/taskTracker/jobcache/job_201005141621_0203/attempt_201005141621_0203_r_000000_3/work
2010-05-19 01:01:06,118 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=dt4.dt.sv4.decarta.com:2181 sessionTimeout=60000 watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher@11f23e5
2010-05-19 01:01:06,120 INFO org.apache.zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset
is false
2010-05-19 01:01:06,136 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server
dt4.dt.sv4.decarta.com/10.241.6.82:2181
2010-05-19 01:01:06,141 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected
local=/10.241.0.69:34497 remote=dt4.dt.sv4.decarta.com/10.241.6.82:2181]
2010-05-19 01:01:06,155 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2010-05-19 01:11:24,084 INFO org.apache.zookeeper.ZooKeeper: Closing session: 0x1288430447a0449
2010-05-19 01:11:24,084 INFO org.apache.zookeeper.ClientCnxn: Closing ClientCnxn for session:
0x1288430447a0449


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Wednesday, May 19, 2010 3:29 PM
To: user@hbase.apache.org
Subject: Re: Put slows down and eventually blocks

What do the MR logs say?  Do they point to an explicit row or region for failed task?  Can
you trace life of that region by grepping it in master logs?  

At time that the failing task runs, grep the regionservers logs for that time.  What is going
on?  Can you get from the bad region now, after MR has gone away?  Are your tasktrackers running
beside your datanodes and regionservers?  Swapping?  i/o stress?  If so, try running w/ less
concurrent tasks at a time.  See if that helps?

St.Ack

On Wed, May 19, 2010 at 9:23 AM, Geoff Hendrey <ghendrey@decarta.com> wrote:
> Another update...actually even after the flush and major_compact the 
> Puts still stopped. I checked my job this morning and it had 
> progressed farther, but ultimately still was killed on the 10 minute timeouts.
>
> -geoff
>
> ________________________________
>
> From: Geoff Hendrey
> Sent: Wednesday, May 19, 2010 12:26 AM
> To: 'hbase-user@hadoop.apache.org'
> Subject: RE: Put slows down and eventually blocks
>
>
> Following up on my last post, I ran "flush" and "major_compact" from 
> the shell, and it seems to have jolted HBase into resuming writes. The 
> blocked Put method returned, and writes have now resumed normally. Any 
> ideas why? Here are a few other relevant details:
>
>
> hbase(main):015:0> zk_dump
>
> HBase tree in ZooKeeper is rooted at /hbase
>  Cluster up? true
>  In safe mode? false
>  Master address: 10.241.6.82:60000
>  Region server holding ROOT: 10.241.6.83:60020
>  Region servers:
>    - 10.241.6.83:60020
>    - 10.241.6.81:60020
>    - 10.241.6.82:60020
>  Quorum Server Statistics:
>    - dt5:2181
>        Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT
>        Clients:
>         /10.241.6.81:52081[1](queued=0,recved=35496,sent=0)
>         /10.241.6.82:38365[1](queued=0,recved=32798,sent=0)
>         /10.241.6.82:60720[1](queued=0,recved=0,sent=0)
>         /10.241.6.82:40457[1](queued=0,recved=114,sent=0)
>
>        Latency min/avg/max: 0/15/669
>        Received: 73534
>        Sent: 0
>        Outstanding: 0
>        Zxid: 0x500033498
>        Mode: leader
>        Node count: 13
>    - dt4:2181
>        Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT
>        Clients:
>         /10.241.0.18:39273[1](queued=0,recved=34,sent=0)
>         /10.241.6.82:43315[1](queued=0,recved=0,sent=0)
>         /10.241.6.81:41762[1](queued=0,recved=169,sent=0)
>         /10.241.6.83:47803[1](queued=0,recved=35438,sent=0)
>
>        Latency min/avg/max: 0/2/2249
>        Received: 1432019
>        Sent: 0
>        Outstanding: 0
>        Zxid: 0x500033498
>        Mode: follower
>        Node count: 13
>    - dt3:2181
>        Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT
>        Clients:
>         /10.241.6.82:59048[1](queued=0,recved=0,sent=0)
>         /10.241.6.82:50822[1](queued=0,recved=36260,sent=0)
>         /10.241.6.81:45696[1](queued=0,recved=30691,sent=0)
>         /10.241.6.83:50027[1](queued=0,recved=36261,sent=0)
>         /10.241.6.82:50823[1](queued=0,recved=36270,sent=0)
>
>        Latency min/avg/max: 0/3/40
>        Received: 140600
>        Sent: 0
>        Outstanding: 0
>        Zxid: 0x500033498
>        Mode: follower
>        Node count: 13
> hbase(main):016:0> status
> 3 servers, 0 dead, 227.3333 average load hbase(main):017:0> flush 
> "SEARCH_KEYS"
> 0 row(s) in 0.7600 seconds
> hbase(main):018:0> status
> 3 servers, 0 dead, 228.3333 average load
>
>
> ________________________________
>
> From: Geoff Hendrey
> Sent: Tuesday, May 18, 2010 11:56 PM
> To: hbase-user@hadoop.apache.org
> Subject: Put slows down and eventually blocks
>
>
> I am experiencing a problem in which Put operations transition from 
> working just fine, to blocking forever. I am doing Put from a reducer. 
> I have tried the following, but none of them prevents the Puts from 
> eventually blocking totally in all the reducers, until the task 
> tracker kills the task due to 10 minute timeout.
>
> 1) try using just one reducer (didn't help)
> 2) try Put.setWriteToWall both true and false (didn't help)
> 3) try autoflush true and false. When true, experiment with different 
> flush buffer sizes (didn't help)
>
> I'v been watching the HDFS namenode and datanode logs, and also the 
> HBase master and region servers. I am running a 3-node HDFS cluster
> (20.2) sharing same 3 nodes with HBase 20.3. I see no problems in any 
> logs, except that the datanode logs eventually stop showing WRITE 
> operations (corresponding to the Put operations eventually coming to a 
> halt). The HBase shell remains snappy and I can do list and status 
> operations and scans without any issue from the shell.
>
> Anyone ever seen anything like this?
>
> -geoff
>
>
> <blocked::http://www.decarta.com>
>
>
>
>
>

Mime
View raw message