hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1938) Make in-memory table scanning faster
Date Sat, 23 Jul 2011 22:23:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070067#comment-13070067

stack commented on HBASE-1938:

bq. I modified the unit test to make it work with the trunk as it is today (new file attached).


Reviewing it, one thing you might want to do is study classes in hbase so get gist of the
hadoop/hbase style.  Notice how they have two spaces for tabs, ~80 chars a line.  But thats
for future.  Not important here.

You just need to make sure your KVs have a readPoint that is less than the current readPoint.
 It looks like you are making KVs w/o setting memstorets.  Default then is used and its zero.
  The default read point is zero.  The compare is <= so it looks like you don't need to
set the read point at all.  What you have should be no harm.

Your new test class seems fine.  Would be nice to add more tests.  As memstore data structure
grows, all slows.

Another issue is about hacking on the concurrentskiplistset that is memstore to make it more
suited to our accesses and perhaps to make it go faster (its public domain when you dig down
into the java src).

bq. On a scan the "next()" part, the hbase currently compare the value of two internals iterators.
In this test, the second list is always empty, hence the cost on comparator is lowered vs.
real life.

What is this that you are referring too?  Is it this? KeyValue kv = scanner.next();

bq. But I don't think it worth a patch just for this (it should be included in a bigger patch

Up to you but yes, the above is probably the way to go.

Thanks N.

> Make in-memory table scanning faster
> ------------------------------------
>                 Key: HBASE-1938
>                 URL: https://issues.apache.org/jira/browse/HBASE-1938
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: MemStoreScanPerformance.java, MemStoreScanPerformance.java, caching-keylength-in-kv.patch,
> This issue is about profiling hbase to see if I can make hbase scans run faster when
all is up in memory.  Talking to some users, they are seeing about 1/4 million rows a second.
 It should be able to go faster than this (Scanning an array of objects, they can do about
4-5x this).

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message