hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8547) Fix java.lang.RuntimeException: Cached an already cached block
Date Wed, 15 May 2013 02:35:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657818#comment-13657818

Enis Soztutar commented on HBASE-8547:

bq. Is it true to say that this can only happen when bottom half of a split is looking for
block at offset 0 – first block – or top-half of a split is looking for last block in
the file?
This is not just block offset 0. I think the trigger is that when we have a midkey for a half
file which does not correspond to the block boundary, both the top and bottom regions will
try to read the whole block. If this happens concurrently then we get an exception. In HBASE-6479,
reports a similar case, but in that they only encountered this for meta blocks, which are
read by both half store files. 

bq. Your unit test verifies half file has this issue?
Yes, I can reproduce this in the test environment, but it does not mimic the requests coming
from upper layers. 

After some offline discussion, I think it makes sense to split this issue into two, and do
just the changes in v2-reduced patch. The reasoning is that the changes are proving to be
quite large with the refactoring, and we have to spend some time to think how are we going
to handle singleton LruBlockCache, CacheConfig, etc. 

> Fix java.lang.RuntimeException: Cached an already cached block
> --------------------------------------------------------------
>                 Key: HBASE-8547
>                 URL: https://issues.apache.org/jira/browse/HBASE-8547
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.98.0, 0.94.8, 0.95.1
>         Attachments: hbase-8547_v1-0.94.patch, hbase-8547_v1-0.94.patch, hbase-8547_v1.patch,
> In one test, one of the region servers received the following on 0.94. 
> Note HalfStoreFileReader in the stack trace. I think the root cause is that after the
region is split, the mid point can be in the middle of the block (for store files that the
mid point is not chosen from). Each half store file tries to load the half block and put it
in the block cache. Since IdLock is instantiated per store file reader, they do not share
the same IdLock instance, thus does not lock against each other effectively. 
> {code}
> 2013-05-12 01:30:37,733 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:·
> java.lang.RuntimeException: Cached an already cached block
>   at org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279)
>   at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353)
>   at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
>   at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480)
>   at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501)
>   at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:237)
>   at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226)
>   at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145)
>   at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351)
>   at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354)
>   at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
>   at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277)
>   at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543)
>   at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411)
>   at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143)
>   at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3829)
>   at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3896)
>   at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3778)
>   at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3770)
>   at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2643)
>   at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308)
>   at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> {code}
> I can see two possible fixes: 
>  # Allow this kind of rare cases in LruBlockCache by not throwing an exception. 
>  # Move the lock instances to upper layer (possibly in CacheConfig), and let half hfile
readers share the same IdLock implementation. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message