hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1207) Fix locking in memcache flush
Date Mon, 15 Jun 2009 07:00:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719429#action_12719429
] 

ryan rawson commented on HBASE-1207:
------------------------------------

following your logic, it suggests that a scanner could get an empty memcache, and files 1-N,
but we might scan some, then once notifyChangedReadersObservers() is called, we get file N+1.
 If the data that was flushed is 'before' the key that is the 'top' key, we have missed some
data that was written, but not flushed.

The hole is smalish, but not tiny.  Flushes take about 700ms.

I think maybe we should:
- add the snapshot to the scan
- notify changed observers when snapshot is created (and catching the snapshot if it wasnt
there)
- notify changed observers when snapshot is finished flushing to disk (already doing)

This should close the hole.

But we should think of a way to verify this bug, and then be able to validate it later on.


> Fix locking in memcache flush
> -----------------------------
>
>                 Key: HBASE-1207
>                 URL: https://issues.apache.org/jira/browse/HBASE-1207
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Ben Maurer
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>
> memcache flushing holds a write lock while it reopens StoreFileScanners. I had a case
where this process timed out and caused an exception to be thrown, which made the region server
believe it had been unable to flush it's cache and shut itself down.
> Stack trace is:
> #
> "regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher" daemon prio=10 tid=0x00000000562df400
nid=0x15d1 runnable [0x000000004108b000..0x000000004108bd90]
> #
>    java.lang.Thread.State: RUNNABLE
> #
>         at java.util.zip.CRC32.updateBytes(Native Method)
> #
>         at java.util.zip.CRC32.update(CRC32.java:45)
> #
>         at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223)
> #
>         at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
> #
>         at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
> #
>         - locked <0x00002aaaec1bd2d8> (a org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1061)
> #
>         - locked <0x00002aaaec1bd2d8> (a org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1616)
> #
>         - locked <0x00002aaad1239000> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1666)
> #
>         - locked <0x00002aaad1239000> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593)
> #
>         - locked <0x00002aaad1239000> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at java.io.DataInputStream.readInt(DataInputStream.java:371)
> #
>         at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1943)
> #
>         - locked <0x00002aaad1238c38> (a org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844)
> #
>         - locked <0x00002aaad1238c38> (a org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890)
> #
>         - locked <0x00002aaad1238c38> (a org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525)
> #
>         - locked <0x00002aaad1238b80> (a org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at org.apache.hadoop.hbase.io.HalfMapFileReader.next(HalfMapFileReader.java:192)
> #
>         - locked <0x00002aaad1238b80> (a org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
> #
>         at org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:110)
> #
>         at org.apache.hadoop.hbase.regionserver.StoreFileScanner.updateReaders(StoreFileScanner.java:378)
> #
>         at org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:737)
> #
>         at org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:725)
> #
>         at org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:694)
> #
>         - locked <0x00002aaab7b41d30> (a java.lang.Integer)
> #
>         at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:630)
> #
>         at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:881)
> #
>         at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
> #
>         at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
> #
>         at org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message