hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Hsieh <victorhs...@gmail.com>
Subject Reducer failed with TableOutputFormat
Date Thu, 04 Feb 2010 02:37:30 GMT
Hi,

I'm running MapReduce with TableOutputFormat, and seems to have
similar trouble as HBASE-1603.  After digging to the log, I realized
that the time when my reducer failed might possibly related to region
"callgraph,0958HGCAKZ,1265213010477" split.  Could you guys help to
check it out?

Help appreciated,
Victor

PS. Here is the log of reducer, hbase master and region server around
the failure time (00:39:42).

==== Reducer log ====
2010-02-04 00:39:42,447 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server Some server, retryOnlyOne=true, index=0,
islastrow=false, tries=9, numtries=10, i=0, listsize=4444,
region=callgraph,0958HGCAKZ,1265213010477 for region
callgraph,0958HGCAKZ,1265213010477, row '0988JHE2YS', but failed after
10 attempts.
...

2010-02-04 00:39:44,769
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server Some server, retryOnlyOne=true, index=0,
islastrow=false, tries=9, numtries=10, i=0, listsize=4444,
region=callgraph,0958HGCAKZ,1265213010477 for region
callgraph,0958HGCAKZ,1265213010477, row '0972DHJAM2', but failed after
10 attempts.
Exceptions:

	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1166)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)
	at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:94)
	at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:55)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        ....

==== HBase Master log ====

2010-02-04 00:38:35,492 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region callgraph,0958HGCAKZ,1265213010477 in 21mins, 31sec
2010-02-04 00:38:35,492 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Starting split of region
callgraph,0958HGCAKZ,1265213010477
2010-02-04 00:38:35,616 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Closing
callgraph,0958HGCAKZ,1265213010477: compactions & flushes disabled
2010-02-04 00:38:35,616 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
region, no outstanding scanners on callgraph,0958HGCAKZ,1265213010477
2010-02-04 00:38:35,616 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: No more row locks
outstanding on region callgraph,0958HGCAKZ,1265213010477
2010-02-04 00:38:35,616 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush
for region callgraph,0958HGCAKZ,1265213010477. Current region memstore
size 54.3m
2010-02-04 00:39:02,542 DEBUG
org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://master.hadoop.mydomain.org:9000/hbase/callgraph/1429362021/pointee/2375866396720998113,
entries=323359, sequenceid=55264870, memsize=54.3m, filesize=15.7m to
callgraph,0958HGCAKZ,1265213010477
2010-02-04 00:39:02,542 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush
of ~54.3m for region callgraph,0958HGCAKZ,1265213010477 in 26926ms,
sequence id=55264870, compaction requested=true
2010-02-04 00:39:02,543 DEBUG
org.apache.hadoop.hbase.regionserver.Store: closed pointee
2010-02-04 00:39:02,544 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Closed
callgraph,0958HGCAKZ,1265213010477
...
2010-02-04 00:40:35,144 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread: region split,
META updated, and report to master all successful. Old region=REGION
=> {NAME => 'callgraph,0958HGCAKZ,1265213010477', STARTKEY =>
'0958HGCAKZ', ENDKEY => '', ENCODED => 1429362021, OFFLINE => true,
SPLIT => true, TABLE => {{NAME => 'callgraph', FAMILIES => [{NAME =>
'pointee', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}]}}, new regions: callgraph,0958HGCAKZ,1265215115595,
callgraph,0972EKCRJ1,1265215115595. Split took 1mins, 59sec
2010-02-04 00:40:35,144 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Skipping compaction on
callgraph,0958HGCAKZ,1265213010477 because closing/closed

==== HBase Region Server log ====

2010-02-04 00:38:35,492 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on
region callgraph,0958HGCAKZ,1265213010477 in 21mins, 31sec
2010-02-04 00:38:35,492 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Starting split of region
callgraph,0958HGCAKZ,1265213010477
2010-02-04 00:38:35,616 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Closing
callgraph,0958HGCAKZ,1265213010477: compactions & flushes disabled
2010-02-04 00:38:35,616 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
region, no outstanding scanners on callgraph,0958HGCAKZ,1265213010477
2010-02-04 00:38:35,616 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: No more row locks
outstanding on region callgraph,0958HGCAKZ,1265213010477
2010-02-04 00:38:35,616 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush
for region callgraph,0958HGCAKZ,1265213010477. Current region memstore
size 54.3m
2010-02-04 00:39:02,542 DEBUG
org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://master.hadoop.mydomain.org:9000/hbase/callgraph/1429362021/pointee/2375866396720998113,
entries=323359, sequenceid=55264870, memsize=54.3m, filesize=15.7m to
callgraph,0958HGCAKZ,1265213010477
2010-02-04 00:39:02,542 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush
of ~54.3m for region callgraph,0958HGCAKZ,1265213010477 in 26926ms,
sequence id=55264870, compaction requested=true
2010-02-04 00:39:02,543 DEBUG
org.apache.hadoop.hbase.regionserver.Store: closed pointee
2010-02-04 00:39:02,544 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Closed
callgraph,0958HGCAKZ,1265213010477
2010-02-04 00:40:35,144 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread: region split,
META updated, and report to master all successful. Old region=REGION
=> {NAME => 'callgraph,0958HGCAKZ,1265213010477', STARTKEY =>
'0958HGCAKZ', ENDKEY => '', ENCODED => 1429362021, OFFLINE => true,
SPLIT => true, TABLE => {{NAME => 'callgraph', FAMILIES => [{NAME =>
'pointee', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}]}}, new regions: callgraph,0958HGCAKZ,1265215115595,
callgraph,0972EKCRJ1,1265215115595. Split took 1mins, 59sec
2010-02-04 00:40:35,144 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Skipping compaction on
callgraph,0958HGCAKZ,1265213010477 because closing/closed

Mime
View raw message