hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1207) Fix locking in memcache flush
Date Mon, 15 Jun 2009 05:42:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719403#action_12719403
] 

Jonathan Gray commented on HBASE-1207:
--------------------------------------

We also grab a lock when we swap the memcache and snapshot.  There are no concurrency issues
with open StoreScanners then, so we can drop synchronization added by my patch in HBASE-1503.
 *However* I think there is a problem here.

We are not notifying readers when we swap the memcache and the snapshot.  So there is a period
of time, after snapshot before flush, where we drop the write lock (allowing readers in).
 We now have a case where the memcache is empty (snapshot'd) but it has not made it to the
storefile yet.  In the case of gets, we look at both the memcache and the snapshot, so this
is not an issue.  Scanners, this is not the case.  We will still be "peeked" potentially at
a value in the memcache that has been now moved to the snapshot.  Other situations we might
iterate down and reach to top of the memcache but it won't actually exist.

> Fix locking in memcache flush
> -----------------------------
>
>                 Key: HBASE-1207
>                 URL: https://issues.apache.org/jira/browse/HBASE-1207
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Ben Maurer
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>
> memcache flushing holds a write lock while it reopens StoreFileScanners. I had a case
where this process timed out and caused an exception to be thrown, which made the region server
believe it had been unable to flush it's cache and shut itself down.
> Stack trace is:
> #
> "regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher" daemon prio=10 tid=0x00000000562df400
nid=0x15d1 runnable [0x000000004108b000..0x000000004108bd90]
> #
>    java.lang.Thread.State: RUNNABLE
> #
>         at java.util.zip.CRC32.updateBytes(Native Method)
> #
>         at java.util.zip.CRC32.update(CRC32.java:45)
> #
>         at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223)
> #
>         at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
> #
>         at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
> #
>         - locked <0x00002aaaec1bd2d8> (a org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1061)
> #
>         - locked <0x00002aaaec1bd2d8> (a org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1616)
> #
>         - locked <0x00002aaad1239000> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1666)
> #
>         - locked <0x00002aaad1239000> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593)
> #
>         - locked <0x00002aaad1239000> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at java.io.DataInputStream.readInt(DataInputStream.java:371)
> #
>         at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1943)
> #
>         - locked <0x00002aaad1238c38> (a org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844)
> #
>         - locked <0x00002aaad1238c38> (a org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890)
> #
>         - locked <0x00002aaad1238c38> (a org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525)
> #
>         - locked <0x00002aaad1238b80> (a org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at org.apache.hadoop.hbase.io.HalfMapFileReader.next(HalfMapFileReader.java:192)
> #
>         - locked <0x00002aaad1238b80> (a org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
> #
>         at org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:110)
> #
>         at org.apache.hadoop.hbase.regionserver.StoreFileScanner.updateReaders(StoreFileScanner.java:378)
> #
>         at org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:737)
> #
>         at org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:725)
> #
>         at org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:694)
> #
>         - locked <0x00002aaab7b41d30> (a java.lang.Integer)
> #
>         at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:630)
> #
>         at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:881)
> #
>         at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
> #
>         at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
> #
>         at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message