hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruben Aguiar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13329) ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray
Date Mon, 15 Jun 2015 10:15:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585739#comment-14585739

Ruben Aguiar commented on HBASE-13329:

I'm sorry but I cannot provide any test for this matter. This issue occurred 3 months ago
and I've moved to a different project long ago. The machines that had information regarding
the cluster have been terminated. I have no way to reproduce the problem. All I can say is
that the change I made did corrected the problem I was having. After I did the change, the
region successfully flushed. Take that information whoever you want it, but that's all I can
provide you with.

> ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray
> ------------------------------------------------------------------------
>                 Key: HBASE-13329
>                 URL: https://issues.apache.org/jira/browse/HBASE-13329
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.0.1
>         Environment: linux-debian-jessie
> ec2 - t2.micro instances
>            Reporter: Ruben Aguiar
>            Assignee: Ruben Aguiar
>            Priority: Critical
>             Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.1
>         Attachments: 13329-asserts.patch, 13329-v1.patch
> While trying to benchmark my opentsdb cluster, I've created a script that sends to hbase
always the same value (in this case 1). After a few minutes, the whole region server crashes
and the region itself becomes impossible to open again (cannot assign or unassign). After
some investigation, what I saw on the logs is that when a Memstore flush is called on a large
region (128mb) the process errors, killing the regionserver. On restart, replaying the edits
generates the same error, making the region unavailable. Tried to manually unassign, assign
or close_region. That didn't work because the code that reads/replays it crashes.
> From my investigation this seems to be an overflow issue. The logs show that the function
getMinimumMidpointArray tried to access index -32743 of an array, extremely close to the minimum
short value in Java. Upon investigation of the source code, it seems an index short is used,
being incremented as long as the two vectors are the same, probably making it overflow on
large vectors with equal data. Changing it to int should solve the problem.
> Here follows the hadoop logs of when the regionserver went down. Any help is appreciated.
Any other information you need please do tell me:
> 2015-03-24 18:00:56,187 INFO  [regionserver//] wal.FSHLog: Rolled
WAL /hbase/WALs/,16020,1427216382590/
with entries=143, filesize=134.70 MB; new WAL /hbase/WALs/,16020,1427216382590/
> 2015-03-24 18:00:56,188 INFO  [regionserver//] wal.FSHLog: Archiving
to hdfs://
> 2015-03-24 18:04:35,722 INFO  [MemStoreFlusher.0] regionserver.HRegion: Started memstore
flush for tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region memstore size
128.04 MB
> 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: ABORTING
region server,16020,1427216382590: Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2.
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69)
> 	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743
> 	at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478)
> 	at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448)
> 	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165)
> 	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146)
> 	at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263)
> 	at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
> 	at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932)
> 	at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121)
> 	at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71)
> 	at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879)
> 	at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1953)
> 	... 7 more
> 2015-03-24 18:04:36,156 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]

This message was sent by Atlassian JIRA

View raw message