hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruben Aguiar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-13329) Memstore flush fails if data has always the same value, breaking the region
Date Tue, 24 Mar 2015 19:15:53 GMT
Ruben Aguiar created HBASE-13329:
------------------------------------

             Summary: Memstore flush fails if data has always the same value, breaking the
region
                 Key: HBASE-13329
                 URL: https://issues.apache.org/jira/browse/HBASE-13329
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 1.0.1
         Environment: linux-debian-jessie
ec2 - t2.micro instances
            Reporter: Ruben Aguiar


While trying to benchmark my opentsdb cluster, I've created a script that sends to hbase always
the same value (in this case 1). After a few minutes, the whole region server crashes and
the region itself becomes impossible to open again (cannot assign or unassign). After some
investigation, what I saw on the logs is that when a Memstore flush is called on a large region
(128mb) the process errors, killing the regionserver. On restart, replaying the edits generates
the same error, making the region unavailable. Tried to manually unassign, assign or close_region.
That didn't work because the code that reads/replays it crashes.
>From my investigation this seems to be an overflow issue. The logs show that the function
getMinimumMidpointArray tried to access index -32743 of an array, extremely close to the minimum
short value in Java. Upon investigation of the source code, it seems an index short is used,
being incremented as long as the two vectors are the same, probably making it overflow on
large vectors with equal data. Changing it to int should solve the problem.
Here follows the hadoop logs of when the regionserver went down. Any help is appreciated.
Any other information you need please do tell me:
2015-03-24 18:00:56,187 INFO  [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Rolled
WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516
with entries=143, filesize=134.70 MB; new WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140
2015-03-24 18:00:56,188 INFO  [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Archiving
hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
to hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709
2015-03-24 18:04:35,722 INFO  [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush
for tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region memstore size 128.04
MB
2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: ABORTING region
server 10.2.0.73,16020,1427216382590: Replay of WAL required. Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
	at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
	at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
	at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
	at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
	at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
	at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
	at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879)
	at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128)
	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1953)
	... 7 more
2015-03-24 18:04:36,156 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message