hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3165) some performance things i did
Date Thu, 28 Oct 2010 20:35:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925941#action_12925941

ryan rawson commented on HBASE-3165:

the problem is the code is pretty ugly and creates a 2nd body of serialization code.  I tried
a lot of things here, and this is just a dump of what I did.  I need to change my measurement
strategy, and test to see which one of the 2-3 approaches works the best with minimal icky-code
addition.  For example the final attempt made it so that Result used the ByteBuffer interface
directly, thus ending up with 2 implementations of the serialization (but only 1 of the deserialization).
I also have a ByteBufferOutputStream which translates OutputStream writes into BB writes and
that would probably be a better from code maintainability, and it might be as fast as using
BB directly.  I want proof of this instead of guessing. Sounds reasonable?

> some performance things i did
> -----------------------------
>                 Key: HBASE-3165
>                 URL: https://issues.apache.org/jira/browse/HBASE-3165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>         Attachments: HBASE-2165-2.txt, HBASE-2165.txt
> in an attempt to improve the profile of the serialization of results in the regionserver
side I did a large number of things to reduce buffer copies, improve the API usage efficiency
(using the BB API directly) and so on.
> Using a YCSB config like so:
> recordcount=10000
> #recordcount=5
> operationcount=1000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=0
> updateproportion=0
> scanproportion=1
> insertproportion=0
> fieldlength=10
> fieldcount=100
> requestdistribution=zipfian
> scanlength=300
> scanlengthdistribution=zipfian
> threadcount=1
> columnfamily=data
> Doing a medium sized scan of 1-300 rows.
> Top line performance was at about 67ms, but these micro improvements didnt budge that
needle, and it didnt change the scale of the CPU profiler - ie: cpu time spent in serialization
was the same.
> Since then I also made an improvement to HBase-YCSB which may have been masking the performance
gains.  I have suspended this work in favor of 0.90 pre-release work for now.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message