hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Use experience and performance data of offheap from Alibaba online cluster
Date Sat, 19 Nov 2016 10:03:26 GMT
Opening a JIRA would be fine. 
This makes it easier for people to obtain the patch(es). 

Cheers

> On Nov 18, 2016, at 11:35 PM, Anoop John <anoop.hbase@gmail.com> wrote:
> 
> Because of some compatibility issues, we decide that this will be done
> in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
> backported patches.  Is it a mega patch at ur end?  Or issue by issue
> patches?  Latter would be best.  Pls share patches in some place and a
> list of issues backported. I can help with verifying the issues once
> so as to make sure we dont miss any...
> 
> -Anoop-
> 
>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar <enis.soz@gmail.com> wrote:
>> Thanks for sharing this. Great work.
>> 
>> I don't see any reason why we cannot backport to branch-1.
>> 
>> Enis
>> 
>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell <andrew.purtell@gmail.com>
>> wrote:
>> 
>>> Yes, please, the patches will be useful to the community even if we decide
>>> not to backport into an official 1.x release.
>>> 
>>> 
>>>>> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
>>>> bbeaudreault@hubspot.com> wrote:
>>>> 
>>>> Is the backported patch available anywhere? Not seeing it on the
>>> referenced
>>>> JIRA. If it ends up not getting officially backported to branch-1 due to
>>>> 2.0 around the corner, some of us who build our own deploy may want to
>>>> integrate into our builds. Thanks! These numbers look great
>>>> 
>>>>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John <anoop.hbase@gmail.com>
>>> wrote:
>>>>> 
>>>>> Hi Yu Li
>>>>>              Good to see that the off heap work help you..  The perf
>>>>> numbers looks great.  So this is a compare of on heap L1 cache vs off
>>> heap
>>>>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
>>>>> cache ON by default I believe.  Will raise a jira for that we can
>>> discuss
>>>>> under that.   Seems like L2 off heap cache for data blocks and L1 cache
>>> for
>>>>> index blocks seems a right choice.
>>>>> 
>>>>> Thanks for the backport and the help in testing the feature..  You were
>>>>> able to find some corner case bugs and helped community to fix them..
>>>>> Thanks goes to ur whole team.
>>>>> 
>>>>> -Anoop-
>>>>> 
>>>>> 
>>>>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li <carp84@gmail.com>
wrote:
>>>>>> 
>>>>>> Sorry guys, let me retry the inline images:
>>>>>> 
>>>>>> Performance w/o offheap:
>>>>>> 
>>>>>> 
>>>>>> Performance w/ offheap:
>>>>>> 
>>>>>> 
>>>>>> Peak Get QPS of one single RS during Singles' Day (11/11):
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> And attach the files in case inline still not working:
>>>>>> 
>>>>>> Performance_without_offheap.png
>>>>>> <
>>>>> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVG
>>> NtM0VxWC1n/view?usp=drive_web
>>>>>> 
>>>>>> 
>>>>>> Performance_with_offheap.png
>>>>>> <
>>>>> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
>>> VrcUdPc2ww/view?usp=drive_web
>>>>>> 
>>>>>> 
>>>>>> Peak_Get_QPS_of_Single_RS.png
>>>>>> <
>>>>> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
>>> F6bHpNYnJz/view?usp=drive_web
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Best Regards,
>>>>>> Yu
>>>>>> 
>>>>>>> On 18 November 2016 at 19:29, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>>>> 
>>>>>>> Yu:
>>>>>>> With positive results, more hbase users would be asking for the
>>> backport
>>>>>>> of offheap read path patches.
>>>>>>> 
>>>>>>> Do you think you or your coworker has the bandwidth to publish
>>> backport
>>>>>>> for branch-1 ?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>>> On Nov 18, 2016, at 12:11 AM, Yu Li <carp84@gmail.com>
wrote:
>>>>>>>> 
>>>>>>>> Dear all,
>>>>>>>> 
>>>>>>>> We have backported read path offheap (HBASE-11425) to our
customized
>>>>>>> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online
for
>>>>> more
>>>>>>> than a month, and would like to share our experience, for what
it's
>>>>> worth
>>>>>>> (smile).
>>>>>>>> 
>>>>>>>> Generally speaking, we gained a better and more stable
>>>>>>> throughput/performance with offheap, and below are some details:
>>>>>>>> 1. QPS become more stable with offheap
>>>>>>>> 
>>>>>>>> Performance w/o offheap:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Performance w/ offheap:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> These data come from our online A/B test cluster (with 450
physical
>>>>>>> machines, and each with 256G memory + 64 core) with real world
>>>>> workloads,
>>>>>>> it shows using offheap we could gain a more stable throughput
as well
>>> as
>>>>>>> better performance
>>>>>>>> 
>>>>>>>> Not showing fully online data here because for online we
published
>>> the
>>>>>>> version with both offheap and NettyRpcServer together, so no
>>> standalone
>>>>>>> comparison data for offheap
>>>>>>>> 
>>>>>>>> 2. Full GC frequency and cost
>>>>>>>> 
>>>>>>>> Average Full GC STW time reduce from 11s to 7s with offheap.
>>>>>>>> 
>>>>>>>> 3. Young GC frequency and cost
>>>>>>>> 
>>>>>>>> No performance degradation observed with offheap.
>>>>>>>> 
>>>>>>>> 4. Peak throughput of one single RS
>>>>>>>> 
>>>>>>>> On Singles Day (11/11), peak throughput of one single RS
reached
>>> 100K,
>>>>>>> among which 90K from Get. Plus internet in/out data we could
know the
>>>>>>> average result size of get request is ~1KB
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Offheap are used on all online machines (more than 1600 nodes)
>>> instead
>>>>>>> of LruCache, so the above QPS is gained from offheap bucketcache,
>>> along
>>>>>>> with NettyRpcServer(HBASE-15756).
>>>>>>>> 
>>>>>>>> Just let us know if any comments. Thanks.
>>>>>>>> 
>>>>>>>> Best Regards,
>>>>>>>> Yu
>>> 

Mime
View raw message