hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13291) Lift the scan ceiling
Date Tue, 07 Apr 2015 04:58:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482589#comment-14482589

ramkrishna.s.vasudevan commented on HBASE-13291:

Again the tweaked patch also we call getKeyLength()?  That is again trying to do Bytes.toInt().
Will it be possible to create a Cell which could do the caching of these values like we did
in BB cell (Anoop has suggested the same in the above comments)?  KVs from memstore will not
have that benefit but we can see how much that improves.  I have seen doing the caching has
removed those methods from the 'hot' methods.
And for the tags hack, we could add an interface with a hasTags API and when we construct
the KV we could set that.  Let all the Cell implementation implement that new interface also.
By default let it be true.
bq.but since we haven't finished I am not aware of a single perf advantage
Yes but for the DBE cases I think we have some advantages.  Because of cells we are not copying
the value part now. Previously we were reconstructing a KV by copying the key and value, now
we do that only for the key part. If the values are bigger we should see some gain.

> Lift the scan ceiling
> ---------------------
>                 Key: HBASE-13291
>                 URL: https://issues.apache.org/jira/browse/HBASE-13291
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 1.0.0
>            Reporter: stack
>            Assignee: stack
>         Attachments: 13291.hacks.txt, 13291.inlining.txt, Screen Shot 2015-03-26 at 12.12.13
PM.png, Screen Shot 2015-03-26 at 3.39.33 PM.png, TimeRange.patch, hack_to_bypass_bb.txt,
nonBBposAndInineMvccVint.txt, q (1).png, scan_no_mvcc_optimized.svg, traces.7.svg, traces.filterall.svg,
traces.nofilter.svg, traces.small2.svg, traces.smaller.svg
> Scanning medium sized rows with multiple concurrent scanners exhibits interesting 'ceiling'
properties. A server runs at about 6.7k ops a second using 450% of possible 1600% of CPUs
 when 4 clients each with 10 threads doing scan 1000 rows.  If I add '--filterAll' argument
(do not return results), then we run at 1450% of possible 1600% possible but we do 8k ops
a second.
> Let me attach flame graphs for two cases. Unfortunately, there is some frustrating dark
art going on. Let me try figure it... Filing issue in meantime to keep score in.

This message was sent by Atlassian JIRA

View raw message