hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Kulbak (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2248) New MemStoreScanner copies memstore for each scan, makes short scans slow
Date Wed, 24 Feb 2010 03:07:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837591#action_12837591

Yoram Kulbak commented on HBASE-2248:

I did the following sanity check: I rolled back memstore to just before HBASE-2037 was applied
[last commit on 21 Oct 2009]. 
[ To get things going I had to put back the MemStore#numKeyValues method and change the  MemStore#clearSnapshot
  argument to SortedSet ]

I then ran TestHRegion and two tests failed:
- testFlushCacheWhileScanning - demonstrates the incorrect scans while a snapshot exists issue
- testWritesWhileScanning - demonstrates 'partial puts' being visible to the scanner
I also tried running TestMemStore but all the tests there have passed. I didn't try running
the whole suite.

It took me a while to figure out what exactly goes wrong when a snapshot exists, the short
(and vague) explanation is that the scanner may return keys in a 'non ordered' manner, meaning
a KV with a higher row  may be returned before a KV with a lower row because the result list
which aggregates results from both snapshot and kvset doesn't guarantee the KVs are added
in a sorted order. I think there's a way to add a simple test to TestMemStore which will demonstrate

> New MemStoreScanner copies memstore for each scan, makes short scans slow
> -------------------------------------------------------------------------
>                 Key: HBASE-2248
>                 URL: https://issues.apache.org/jira/browse/HBASE-2248
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Dave Latham
>             Fix For: 0.20.4
>         Attachments: hbase-2248.gc, Screen shot 2010-02-23 at 10.33.38 AM.png, threads.txt
> HBASE-2037 introduced a new MemStoreScanner which triggers a ConcurrentSkipListMap.buildFromSorted
clone of the memstore and snapshot when starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short scans.  Some
of our data repesent a time series.   The data is stored in time series order, MR jobs often
insert/update new data at the end of the series, and queries usually have to pick up some
or all of the series.  These are often scans of 0-100 rows at a time.  To load one page, we'll
observe about 20 such scans being triggered concurrently, and they take 2 seconds to complete.
 Doing a thread dump of a region server shows many threads in ConcurrentSkipListMap.biuldFromSorted
which traverses the entire map of key values to copy it.  

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message