hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Washusen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2248) New MemStoreScanner copies memstore for each scan, makes short scans slow
Date Wed, 24 Feb 2010 01:29:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837562#action_12837562
] 

Dan Washusen commented on HBASE-2248:
-------------------------------------

@Dave: 

Correct you are.  I've added comments on HBASE-2249 as a result of your comments here...

It's worth noting that in the case of ScanTest the cost of setting up the ResultScanner is
almost non-existent compared to the cost of scanning over the majority of table.  The ScanTest
takes 23 seconds in total according to the log output (including opening the scanner etc).

Dave, the numbers I posted above (9ms) were from the RandomScanWithRangeTest.  As you mention,
these tests include the cost of opening the scanner.  I was under the impression that this
was closer to your use case (e.g. specify both a scan.startRow and scan.stopRow which returns
a small number of rows)...?

> New MemStoreScanner copies memstore for each scan, makes short scans slow
> -------------------------------------------------------------------------
>
>                 Key: HBASE-2248
>                 URL: https://issues.apache.org/jira/browse/HBASE-2248
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Dave Latham
>             Fix For: 0.20.4
>
>         Attachments: hbase-2248.gc, Screen shot 2010-02-23 at 10.33.38 AM.png, threads.txt
>
>
> HBASE-2037 introduced a new MemStoreScanner which triggers a ConcurrentSkipListMap.buildFromSorted
clone of the memstore and snapshot when starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short scans.  Some
of our data repesent a time series.   The data is stored in time series order, MR jobs often
insert/update new data at the end of the series, and queries usually have to pick up some
or all of the series.  These are often scans of 0-100 rows at a time.  To load one page, we'll
observe about 20 such scans being triggered concurrently, and they take 2 seconds to complete.
 Doing a thread dump of a region server shows many threads in ConcurrentSkipListMap.biuldFromSorted
which traverses the entire map of key values to copy it.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message