hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2248) New MemStoreScanner copies memstore for each scan, makes short scans slow
Date Tue, 23 Feb 2010 17:54:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837343#action_12837343
] 

stack commented on HBASE-2248:
------------------------------

.bq Can anyone shed light on why HBASE-2037 introduced this clone in the first place? Seems
like a totally braindead thing for performance. 

Mea Culpa. I should have caught this in review, the non-scalable, expensive full-copy.  Dumb.

I also should have run PE to catch degradation in performance before release though in this
case, according to Dan, as PE is now, we'd not have caught the slowed-down memstore since
we flush after each PE run and since the short-scan test is new with no history (Long time
ago I wrote up a how-to-release: http://wiki.apache.org/hadoop/Hbase/HowToRelease.  It says
PE required but I think I've not followed this receipe in a good while now).

.bq The 0.20.2 Memstore was using the ConcurrentSkipListMap#tailMap for every row. tailMap
incurs an O(log) overhead when called on a ConcurrentSkipListMap so the total overhead of
scanning the whole memstore in some cases, may be very close to the overhead of a complete
sort of the KVs in memstore.

In the old implementation, we used to also make a copy of a row, everytime we called a next,
to protect against the case where snapshot was removed out from under us.

.bq The scanner scans incorrectly when a snapshot exists

Why was this again?

.bq ... increased GC overhead on multiple concurrent scans

Dave, can you enable GC logging?  Even if this is the case, it needs to be addressed.

.bq Is it possible to avoid both 'partial puts' and cloning by 'timestamping' memstore records?
e.g. each new KV in memstore gets a 'memstore timestamp' and when a scanner is created it
grabs the current timestamp so that it knows to ignore KVs which entered the store after its
creation? Should probably use a counter and not currentTimeMillis to ensure a clear-cut.

How would we snapshot such a thing?

We could add another ts/counter to KV.  We could do an AND on the type setting a bit if extra
ts is present.  We then write out the KV as old style, dropping extra ts when we flush to
hfile, or we just dump it all out.  System would need to be able to work with old-style KVs.
 Comparator would be adjusted to accomodate new KV.   We'd do a tailset each time we made
a scanner?  This would be a big change.  We should probably bump rpc version and require a
restart of hbase cluster on upgrade.

> New MemStoreScanner copies memstore for each scan, makes short scans slow
> -------------------------------------------------------------------------
>
>                 Key: HBASE-2248
>                 URL: https://issues.apache.org/jira/browse/HBASE-2248
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Dave Latham
>             Fix For: 0.20.4
>
>         Attachments: threads.txt
>
>
> HBASE-2037 introduced a new MemStoreScanner which triggers a ConcurrentSkipListMap.buildFromSorted
clone of the memstore and snapshot when starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short scans.  Some
of our data repesent a time series.   The data is stored in time series order, MR jobs often
insert/update new data at the end of the series, and queries usually have to pick up some
or all of the series.  These are often scans of 0-100 rows at a time.  To load one page, we'll
observe about 20 such scans being triggered concurrently, and they take 2 seconds to complete.
 Doing a thread dump of a region server shows many threads in ConcurrentSkipListMap.biuldFromSorted
which traverses the entire map of key values to copy it.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message