hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-3480) Reduce the size of Result serialization
Date Wed, 26 Jan 2011 06:55:43 GMT
Reduce the size of Result serialization
---------------------------------------

                 Key: HBASE-3480
                 URL: https://issues.apache.org/jira/browse/HBASE-3480
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.0
            Reporter: ryan rawson


When faced with a gigabit ethernet network connection, things are pretty slow actually.  For
example, let's take a 2 MB reply, using a 120MB/sec line rate, we are talking about about
16ms to transfer that data across a gige line.  This is a pretty significant amount of time.

So this JIRA is about reducing the size of the Result[] serialization.  By exploiting family
and qualifier and rowkey duplication, I created a simple encoding scheme to use a dictionary
instead of literal strings.  

in my testing, I am seeing some success with the sizes.  Average serialized size is about
1/2 of previous, but time to serialize on the regionserver side is way up, by a factor of
10x.  This might be due to the simplistic first implementation however.

Here is the post change size:
grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; print $1, "
", $2, "\n" if $1 > 10000;' | cut -f1 -d' ' | perl -ne '$sum += $_; $count++; END {print
$sum/$count, "\n"}'
377047.1125

Here is the pre change size:
grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; print $1, "
", $2, "\n" if $1 > 10000;' | cut -f1 -d' ' | perl -ne '$sum += $_; $count++; END {print
$sum/$count, "\n"}'
601078.505882353

That is about a 60% improvement in size.

But times are not so good, here are some samples of the old, in (size) (time in ns)
3874599 10685836
5582725 11525888

so that is about 11ms to serialize 3-5mb of data.

In the new implementation:
1898788 118504672
1630058 91133003

this is 118-91ms for serialized sizes of 1.6-1.8 MB.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message