hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2248) Provide new non-copy mechanism to assure atomic reads in get and scan
Date Tue, 16 Mar 2010 05:54:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845724#action_12845724
] 

stack commented on HBASE-2248:
------------------------------

Yeah, I just tried to run test suite and ran into at least the TestHeapSize failures.

On a test up on cluster, something is up.  Its not deadlocked but its only making slow progress.
 Regionservers are all waiting for something to do.  Will look in morning.

On the patch:

+ "aka DNC" ... whats DNC? (Democratic National Committee?)
+ In KV, it has "+   * @deprecated"  Usually deprecated points helpfully to what should be
used instead.  What should folks use instead of createFirstOnRow override?
+ +1 on this comment of yours "+      // TODO the family and qualifier should be compared
separately"
+ So, on flush of the MemStore, we don't need to clean out items that MemStore Deletes effect?
 We now let go of the old axiom that Deletes in storefiles only apply to storefiles that follow
and not to the current storefile?
+ I love all the stuff removed.

More review later.

What do we see as implications of removal of the special Get-code path?

+ Is it true that now, you can do inserts where timestamps are out of order? (If no deletes?)
 If so, don't we need unit tests to prove this assertion?
+ What about performance?  Though the new Get-Scan does storefile accesses in parallel, if
> N storefiles, if looking for latest version only, we'll be slower (at least until we
add BFs?).

> Provide new non-copy mechanism to assure atomic reads in get and scan
> ---------------------------------------------------------------------
>
>                 Key: HBASE-2248
>                 URL: https://issues.apache.org/jira/browse/HBASE-2248
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Dave Latham
>            Priority: Blocker
>             Fix For: 0.20.4
>
>         Attachments: HBASE-2248-demonstrate-previous-impl-bugs.patch, HBASE-2248-GetsAsScans3.patch,
HBASE-2248-rr-alpha1.txt, HBASE-2248-ryan.patch, hbase-2248.gc, HBASE-2248.patch, hbase-2248.txt,
readownwrites-lost.2.patch, readownwrites-lost.patch, Screen shot 2010-02-23 at 10.33.38 AM.png,
threads.txt
>
>
> HBASE-2037 introduced a new MemStoreScanner which triggers a ConcurrentSkipListMap.buildFromSorted
clone of the memstore and snapshot when starting a scan.
> After upgrading to 0.20.3, we noticed a big slowdown in our use of short scans.  Some
of our data repesent a time series.   The data is stored in time series order, MR jobs often
insert/update new data at the end of the series, and queries usually have to pick up some
or all of the series.  These are often scans of 0-100 rows at a time.  To load one page, we'll
observe about 20 such scans being triggered concurrently, and they take 2 seconds to complete.
 Doing a thread dump of a region server shows many threads in ConcurrentSkipListMap.biuldFromSorted
which traverses the entire map of key values to copy it.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message