hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <andrew.purt...@gmail.com>
Subject Re: Use experience and performance data of offheap from Alibaba online cluster
Date Fri, 18 Nov 2016 17:37:40 GMT
Yes, please, the patches will be useful to the community even if we decide not to backport
into an official 1.x release.


> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <bbeaudreault@hubspot.com> wrote:
> 
> Is the backported patch available anywhere? Not seeing it on the referenced
> JIRA. If it ends up not getting officially backported to branch-1 due to
> 2.0 around the corner, some of us who build our own deploy may want to
> integrate into our builds. Thanks! These numbers look great
> 
>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John <anoop.hbase@gmail.com> wrote:
>> 
>> Hi Yu Li
>>               Good to see that the off heap work help you..  The perf
>> numbers looks great.  So this is a compare of on heap L1 cache vs off heap
>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
>> cache ON by default I believe.  Will raise a jira for that we can discuss
>> under that.   Seems like L2 off heap cache for data blocks and L1 cache for
>> index blocks seems a right choice.
>> 
>> Thanks for the backport and the help in testing the feature..  You were
>> able to find some corner case bugs and helped community to fix them..
>> Thanks goes to ur whole team.
>> 
>> -Anoop-
>> 
>> 
>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li <carp84@gmail.com> wrote:
>>> 
>>> Sorry guys, let me retry the inline images:
>>> 
>>> Performance w/o offheap:
>>> 
>>> ​
>>> Performance w/ offheap:
>>> 
>>> ​
>>> Peak Get QPS of one single RS during Singles' Day (11/11):
>>> 
>>> ​
>>> 
>>> And attach the files in case inline still not working:
>>> ​​​
>>> Performance_without_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVGNtM0VxWC1n/view?usp=drive_web
>>> 
>>> ​​
>>> Performance_with_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeFVrcUdPc2ww/view?usp=drive_web
>>> 
>>> ​​
>>> Peak_Get_QPS_of_Single_RS.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3F6bHpNYnJz/view?usp=drive_web
>>> 
>>> ​
>>> 
>>> 
>>> Best Regards,
>>> Yu
>>> 
>>>> On 18 November 2016 at 19:29, Ted Yu <yuzhihong@gmail.com> wrote:
>>>> 
>>>> Yu:
>>>> With positive results, more hbase users would be asking for the backport
>>>> of offheap read path patches.
>>>> 
>>>> Do you think you or your coworker has the bandwidth to publish backport
>>>> for branch-1 ?
>>>> 
>>>> Thanks
>>>> 
>>>>> On Nov 18, 2016, at 12:11 AM, Yu Li <carp84@gmail.com> wrote:
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> We have backported read path offheap (HBASE-11425) to our customized
>>>> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
>> more
>>>> than a month, and would like to share our experience, for what it's
>> worth
>>>> (smile).
>>>>> 
>>>>> Generally speaking, we gained a better and more stable
>>>> throughput/performance with offheap, and below are some details:
>>>>> 1. QPS become more stable with offheap
>>>>> 
>>>>> Performance w/o offheap:
>>>>> 
>>>>> 
>>>>> 
>>>>> Performance w/ offheap:
>>>>> 
>>>>> 
>>>>> 
>>>>> These data come from our online A/B test cluster (with 450 physical
>>>> machines, and each with 256G memory + 64 core) with real world
>> workloads,
>>>> it shows using offheap we could gain a more stable throughput as well as
>>>> better performance
>>>>> 
>>>>> Not showing fully online data here because for online we published the
>>>> version with both offheap and NettyRpcServer together, so no standalone
>>>> comparison data for offheap
>>>>> 
>>>>> 2. Full GC frequency and cost
>>>>> 
>>>>> Average Full GC STW time reduce from 11s to 7s with offheap.
>>>>> 
>>>>> 3. Young GC frequency and cost
>>>>> 
>>>>> No performance degradation observed with offheap.
>>>>> 
>>>>> 4. Peak throughput of one single RS
>>>>> 
>>>>> On Singles Day (11/11), peak throughput of one single RS reached 100K,
>>>> among which 90K from Get. Plus internet in/out data we could know the
>>>> average result size of get request is ~1KB
>>>>> 
>>>>> 
>>>>> 
>>>>> Offheap are used on all online machines (more than 1600 nodes) instead
>>>> of LruCache, so the above QPS is gained from offheap bucketcache, along
>>>> with NettyRpcServer(HBASE-15756).
>>>>> 
>>>>> Just let us know if any comments. Thanks.
>>>>> 
>>>>> Best Regards,
>>>>> Yu
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 

Mime
View raw message