hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <andrew.purt...@gmail.com>
Subject Re: DISCUSSION: lets do a developer workshop on near-term work
Date Sun, 19 Jul 2015 04:50:05 GMT
Hi Ram,

Do you have any targets for what you are measuring? What are the goals you guys are working
toward with the off heaping changes? 


> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <ramkrishna.s.vasudevan@gmail.com>
wrote:
> 
> Thanks Vladimir.
> Yeah, the reports that were attached specifically captured the 95/99th
> percentile.
> The reason for checking the server side perf was to specifically see the
> improvement in the server side and also the client was sending large
> results in multiple threads. So wanted to avoid the n/w interference. I
> think it was a general practice that we were following.
> We Wil do some more tests and get some latest readings with bigger data
> sets.
> Sent from mobile.
>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <andrew.purtell@gmail.com> wrote:
>> 
>> +1
>> 
>> Yeah, something like that, with aspirational targets for improvement from
>> current releases. Then what to measure, the tests to run, and criteria for
>> evaluation are clear and organized and we're able to better assess how the
>> work in progress is meeting its goals (or not)
>> 
>> 
>> 
>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <vladrodionov@gmail.com>
>> wrote:
>> 
>>>>> Umbrella jira to make sure we can have blocks cached in offheap backed
>>> cache. In the entire read path, we can refer to this offheap buffer and
>>> avoid onheap copying.
>>> 
>>> I think, on a read path, the most important improvement we could imagine
>> is
>>> elimination or reducing of object creations (KVs, iterators etc).
>>> object reuse, byte buffers reuse or offheap buffers reuse, API change
>> etc.
>>> If this is a part of this JIRA, then I would easily define a goal:
>>> improving 95/99% latency of a read operations. Not performance, but
>> latency
>>> matters
>>> 
>>> -Vlad
>>> 
>>> 
>>> 
>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
>> andrew.purtell@gmail.com>
>>> wrote:
>>> 
>>>> That's not a realistic or useful test scenario, unless the goal is to
>>>> accelerate queries where all cells are filtered at the server.
>>>> 
>>>> 
>>>> 
>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <anoop.hbase@gmail.com>
>> wrote:
>>>>> 
>>>>> No Andy. 11425 having doc attached to it. At the end of it, we have
>> added
>>>>> perf numbers in a cluster testing.  This was done using PE get and scan
>>>>> tests with filtering all cells at server (to not consider n/w bandwidth
>>>>> constraints)
>>>>> 
>>>>> -Anoop-
>>>>> 
>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
>>>> andrew.purtell@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> We have some microbenchmarks, not evidence of differences seen from
a
>>>>>> client application. I'm not saying that microbenchmarks are not
>> totally
>>>>>> necessary and a great start - they are - but that they don't measure
>> an
>>>> end
>>>>>> goal. Furthermore unless I've missed one somewhere we don't have
a
>> JIRA
>>>> or
>>>>>> design doc that states a clear end goal metric like the strawman
I
>> threw
>>>>>> together in my previous mail. A measurable system level goal and
some
>>>> data
>>>>>> from full cluster testing would go a lot further toward letting all
of
>>>> us
>>>>>> evaluate the potential and payoff of the work. In the meantime we
>> should
>>>>>> probably be assembling these changes on a branch instead of in trunk,
>>>> for
>>>>>> as long as the goal is not clearly defined and the payoff and
>> potential
>>>> for
>>>>>> perf regressions is untested and unknown.
>>>>>> 
>>>>>> 
>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <anoop.hbase@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached which
>> contains
>>>>>> some
>>>>>>> perf gain numbers..  We will be doing more tests in next 2 weeks
>>>> (before
>>>>>>> end of this month) and will publish them.   Yes it will be great
if
>> it
>>>> is
>>>>>>> more IST friendly time :-)
>>>>>>> 
>>>>>>> -Anoop-
>>>>>>> 
>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>>>>>> andrew.purtell@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>>> I can represent your side Ram (and Anoop). I've been
known always
>>>> argue
>>>>>>>> both side of a discussion and to never take sides easily
(drives
>> some
>>>>>> folks
>>>>>>>> crazy).
>>>>>>>> 
>>>>>>>> I can vouch for this (smile)
>>>>>>>> 
>>>>>>>> I also can offer support for off heaping there. At the same
time we
>> do
>>>>>>>> have a gap where we can't point to a timeline of improvements
(yet,
>>>>>> anyway)
>>>>>>>> with benchmarks showing gains where your goals need them.
For
>> example,
>>>>>>>> stock HBase in one JVM can address max N GB for response
time
>>>>>> distribution
>>>>>>>> D; dev version of HBase in off heap branch can address max
N' GB for
>>>>>>>> distribution D', where N' > N and D > D' (distribution
D'
>>>> statistically
>>>>>>>> shows better/lower response times).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <larsh@apache.org>
>> wrote:
>>>>>>>>> 
>>>>>>>>> I'm in favor of anything that improves performance (and
preferably
>>>>>>>> doesn't set us back into a world that's worse than C due
to the lack
>>>> of
>>>>>>>> pointers in Java).Never said "I don't like it", it's just
that I'm
>>>>>> perhaps
>>>>>>>> asking for more numbers and justification in weighing the
pros and
>>>> cons.
>>>>>>>>> I can represent your side Ram (and Anoop). I've been
known always
>>>> argue
>>>>>>>> both side of a discussion and to never take sides easily
(drives
>> some
>>>>>> folks
>>>>>>>> crazy). And Stack's there too, he yell at me where needed
:)
>>>>>>>>> 
>>>>>>>>> Perhaps we can do it a bit later in the evening so there
is a
>>>> fighting
>>>>>>>> chance that folks on IST can participate. I know that some
of our
>>>> folks
>>>>>> on
>>>>>>>> IST would love to participate in the backup discussion).
>>>>>>>>> 
>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown
SF. I'd just
>>>> need
>>>>>>>> an approx. number of folks.
>>>>>>>>> 
>>>>>>>>> -- Lars
>>>>>>>>> 
>>>>>>>>>  From: ramkrishna vasudevan <ramkrishna.s.vasudevan@gmail.com>
>>>>>>>>> To: "dev@hbase.apache.org" <dev@hbase.apache.org>;
lars hofhansl <
>>>>>>>> larsh@apache.org>
>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop
on near-term
>>>> work
>>>>>>>>> 
>>>>>>>>> Hi
>>>>>>>>> What time will it be on August 26th?
>>>>>>>>> @LarsYa. I know that you are not generally in favour
of this
>>>> offheaping
>>>>>>>> stuff.  May be if we (from India) can attend this meeting
remotely
>>>> your
>>>>>>>> thoughts can be discussed and also the current state of this
work.
>>>>>>>>> RegardsRam
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <larsh@apache.org>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Works for me. I'll be back in the Bay Area the week of
August 9th.
>>>>>>>>> We have done a _lot_ of work on backups as well - ours
are more
>>>>>>>> complicated as we wanted fast per-tenant restores, so data
is
>>>> "grouped"
>>>>>> by
>>>>>>>> tenant. Would like to sync up on that (hopefully some of
the folks
>> who
>>>>>>>> wrote most of the code will be in town, I'll check).
>>>>>>>>> 
>>>>>>>>> Also interested in the "Time" and "offheap" parts (although
you
>> folks
>>>>>>>> usually do not like what I think about the offheap efforts
:) ).
>>>>>>>>> Would like to add the following topics:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - "Timestamp Resolution". Or making space for more bits
in the
>>>>>>>> timestamps (happy to cover that, unless it's part of the
"Time"
>> topic)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - "Replication". We found that replication cannot keep
up with high
>>>>>>>> write loads, due to the fact that replicated is strictly
single
>>>> threaded
>>>>>>>> per regionserver (even though we have multiple region servers
on the
>>>>>> sink
>>>>>>>> side)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - "Spark integration" (Ted Malaska?)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> OK... Out now to make a "bullshit hat".
>>>>>>>>> 
>>>>>>>>> -- Lars
>>>>>>>>> 
>>>>>>>>> ________________________________
>>>>>>>>> From: Sean Busbey <busbey@cloudera.com>
>>>>>>>>> To: dev <dev@hbase.apache.org>
>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop
on near-term
>>>> work
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I'm planning to be in the Bay area the week of the 24th
of August.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Sean
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <apurtell@apache.org>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I can be up in your area in August.
>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <stack@duboce.net>
>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar
<
>>>> enis.soz@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Sounds good. It has been a while we did the
talk-aton.
>>>>>>>>>>>> 
>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer
something next week
>>>> if
>>>>>>>>>>>> possible.
>>>>>>>>>>>> 
>>>>>>>>>>>> You ever coming back? If so, when? I'm back
on 10th of August
>>>>>> (Mikhail
>>>>>>>>>> on
>>>>>>>>>>> the 20th).
>>>>>>>>>>> St.Ack
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Enis
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack
<stack@duboce.net>
>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Matteo and I were thinking it time devs
got together for a
>>>> pow-wow.
>>>>>>>>>>> There
>>>>>>>>>>>>> is a bunch of stuff in flight at the
moment (see below list)
>> and
>>>> it
>>>>>>>>>>> would
>>>>>>>>>>>>> be good to meet and whiteboard, surface
goodo ideas that have
>>>> gone
>>>>>>>>>>>> dormant
>>>>>>>>>>>>> in JIRA, or revisit designs/proposals
out in JIRA-attached
>> google
>>>>>> doc
>>>>>>>>>>>> that
>>>>>>>>>>>>> need socializing.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> You can only come if you are wearing
your bullshit hat.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Topics we'd go over could include:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> + Our filesystem layout will not work
if 1M regions
>>>> (Matteo/Stack)
>>>>>>>>>>>>> + Current state of the offheaping of
read path and alternate
>>>>>> KeyValue
>>>>>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>>>>>> + Time (Enis)
>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>>>>>> + hbase-2.0.0
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I put names by folks I know could talk
to the topic. If you
>> want
>>>> to
>>>>>>>>>>> take
>>>>>>>>>>>>> over a topic or put your name by one,
just say.  Suggest that
>>>>>>>>>>> discussion
>>>>>>>>>>>>> lead off with a 5-10minute on current
state of
>>>>>>>>>>>>> thought/design/implementation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What do others think?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What date would suit folks?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Anyone want to host?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Matteo and St.Ack
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Best regards,
>>>>>>>>>> 
>>>>>>>>>> - Andy
>>>>>>>>>> 
>>>>>>>>>> Problems worthy of attack prove their worth by hitting
back. -
>> Piet
>>>>>> Hein
>>>>>>>>>> (via Tom White)
>> 

Mime
View raw message