hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <andrew.purt...@gmail.com>
Subject Re: DISCUSSION: lets do a developer workshop on near-term work
Date Mon, 20 Jul 2015 16:29:10 GMT
Cool, thanks. 

Is a 20% latency reduction the most we can expect or do you think there is room for more improvement?
Just curious. 

Is latency reduction the only goal? Anything here about supporting larger heaps? Is there
something we can measure in that regard?

Hope you see my point and there's enough here to prime a goals and metrics discussion at the
pow wow or on the relevant JIRAs. 

> On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <ramkrishna.s.vasudevan@gmail.com>
wrote:
> 
> Hi Andy
> 
> Based on our POCs done, we expect around 20% improvement in latency.  For
> scans it will be little lesser than 20%.
> 
> Regards
> Ram
> 
> 
> On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <andrew.purtell@gmail.com>
> wrote:
> 
>> Hi Ram,
>> 
>> Do you have any targets for what you are measuring? What are the goals you
>> guys are working toward with the off heaping changes?
>> 
>> 
>>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
>>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>> 
>>> Thanks Vladimir.
>>> Yeah, the reports that were attached specifically captured the 95/99th
>>> percentile.
>>> The reason for checking the server side perf was to specifically see the
>>> improvement in the server side and also the client was sending large
>>> results in multiple threads. So wanted to avoid the n/w interference. I
>>> think it was a general practice that we were following.
>>> We Wil do some more tests and get some latest readings with bigger data
>>> sets.
>>> Sent from mobile.
>>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <andrew.purtell@gmail.com>
>> wrote:
>>>> 
>>>> +1
>>>> 
>>>> Yeah, something like that, with aspirational targets for improvement
>> from
>>>> current releases. Then what to measure, the tests to run, and criteria
>> for
>>>> evaluation are clear and organized and we're able to better assess how
>> the
>>>> work in progress is meeting its goals (or not)
>>>> 
>>>> 
>>>> 
>>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <vladrodionov@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>>>> Umbrella jira to make sure we can have blocks cached in offheap
>> backed
>>>>> cache. In the entire read path, we can refer to this offheap buffer and
>>>>> avoid onheap copying.
>>>>> 
>>>>> I think, on a read path, the most important improvement we could
>> imagine
>>>> is
>>>>> elimination or reducing of object creations (KVs, iterators etc).
>>>>> object reuse, byte buffers reuse or offheap buffers reuse, API change
>>>> etc.
>>>>> If this is a part of this JIRA, then I would easily define a goal:
>>>>> improving 95/99% latency of a read operations. Not performance, but
>>>> latency
>>>>> matters
>>>>> 
>>>>> -Vlad
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
>>>> andrew.purtell@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> That's not a realistic or useful test scenario, unless the goal is
to
>>>>>> accelerate queries where all cells are filtered at the server.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <anoop.hbase@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>> No Andy. 11425 having doc attached to it. At the end of it, we
have
>>>> added
>>>>>>> perf numbers in a cluster testing.  This was done using PE get
and
>> scan
>>>>>>> tests with filtering all cells at server (to not consider n/w
>> bandwidth
>>>>>>> constraints)
>>>>>>> 
>>>>>>> -Anoop-
>>>>>>> 
>>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
>>>>>> andrew.purtell@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> We have some microbenchmarks, not evidence of differences
seen from
>> a
>>>>>>>> client application. I'm not saying that microbenchmarks are
not
>>>> totally
>>>>>>>> necessary and a great start - they are - but that they don't
measure
>>>> an
>>>>>> end
>>>>>>>> goal. Furthermore unless I've missed one somewhere we don't
have a
>>>> JIRA
>>>>>> or
>>>>>>>> design doc that states a clear end goal metric like the strawman
I
>>>> threw
>>>>>>>> together in my previous mail. A measurable system level goal
and
>> some
>>>>>> data
>>>>>>>> from full cluster testing would go a lot further toward letting
all
>> of
>>>>>> us
>>>>>>>> evaluate the potential and payoff of the work. In the meantime
we
>>>> should
>>>>>>>> probably be assembling these changes on a branch instead
of in
>> trunk,
>>>>>> for
>>>>>>>> as long as the goal is not clearly defined and the payoff
and
>>>> potential
>>>>>> for
>>>>>>>> perf regressions is untested and unknown.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <anoop.hbase@gmail.com>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached
which
>>>> contains
>>>>>>>> some
>>>>>>>>> perf gain numbers..  We will be doing more tests in next
2 weeks
>>>>>> (before
>>>>>>>>> end of this month) and will publish them.   Yes it will
be great if
>>>> it
>>>>>> is
>>>>>>>>> more IST friendly time :-)
>>>>>>>>> 
>>>>>>>>> -Anoop-
>>>>>>>>> 
>>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>>>>>>>> andrew.purtell@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>>> I can represent your side Ram (and Anoop). I've
been known always
>>>>>> argue
>>>>>>>>>> both side of a discussion and to never take sides
easily (drives
>>>> some
>>>>>>>> folks
>>>>>>>>>> crazy).
>>>>>>>>>> 
>>>>>>>>>> I can vouch for this (smile)
>>>>>>>>>> 
>>>>>>>>>> I also can offer support for off heaping there. At
the same time
>> we
>>>> do
>>>>>>>>>> have a gap where we can't point to a timeline of
improvements
>> (yet,
>>>>>>>> anyway)
>>>>>>>>>> with benchmarks showing gains where your goals need
them. For
>>>> example,
>>>>>>>>>> stock HBase in one JVM can address max N GB for response
time
>>>>>>>> distribution
>>>>>>>>>> D; dev version of HBase in off heap branch can address
max N' GB
>> for
>>>>>>>>>> distribution D', where N' > N and D > D' (distribution
D'
>>>>>> statistically
>>>>>>>>>> shows better/lower response times).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <larsh@apache.org>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I'm in favor of anything that improves performance
(and
>> preferably
>>>>>>>>>> doesn't set us back into a world that's worse than
C due to the
>> lack
>>>>>> of
>>>>>>>>>> pointers in Java).Never said "I don't like it", it's
just that I'm
>>>>>>>> perhaps
>>>>>>>>>> asking for more numbers and justification in weighing
the pros and
>>>>>> cons.
>>>>>>>>>>> I can represent your side Ram (and Anoop). I've
been known always
>>>>>> argue
>>>>>>>>>> both side of a discussion and to never take sides
easily (drives
>>>> some
>>>>>>>> folks
>>>>>>>>>> crazy). And Stack's there too, he yell at me where
needed :)
>>>>>>>>>>> 
>>>>>>>>>>> Perhaps we can do it a bit later in the evening
so there is a
>>>>>> fighting
>>>>>>>>>> chance that folks on IST can participate. I know
that some of our
>>>>>> folks
>>>>>>>> on
>>>>>>>>>> IST would love to participate in the backup discussion).
>>>>>>>>>>> 
>>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown
SF. I'd just
>>>>>> need
>>>>>>>>>> an approx. number of folks.
>>>>>>>>>>> 
>>>>>>>>>>> -- Lars
>>>>>>>>>>> 
>>>>>>>>>>> From: ramkrishna vasudevan <ramkrishna.s.vasudevan@gmail.com>
>>>>>>>>>>> To: "dev@hbase.apache.org" <dev@hbase.apache.org>;
lars
>> hofhansl <
>>>>>>>>>> larsh@apache.org>
>>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
workshop on
>> near-term
>>>>>> work
>>>>>>>>>>> 
>>>>>>>>>>> Hi
>>>>>>>>>>> What time will it be on August 26th?
>>>>>>>>>>> @LarsYa. I know that you are not generally in
favour of this
>>>>>> offheaping
>>>>>>>>>> stuff.  May be if we (from India) can attend this
meeting remotely
>>>>>> your
>>>>>>>>>> thoughts can be discussed and also the current state
of this work.
>>>>>>>>>>> RegardsRam
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl
<larsh@apache.org
>>> 
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Works for me. I'll be back in the Bay Area the
week of August
>> 9th.
>>>>>>>>>>> We have done a _lot_ of work on backups as well
- ours are more
>>>>>>>>>> complicated as we wanted fast per-tenant restores,
so data is
>>>>>> "grouped"
>>>>>>>> by
>>>>>>>>>> tenant. Would like to sync up on that (hopefully
some of the folks
>>>> who
>>>>>>>>>> wrote most of the code will be in town, I'll check).
>>>>>>>>>>> 
>>>>>>>>>>> Also interested in the "Time" and "offheap" parts
(although you
>>>> folks
>>>>>>>>>> usually do not like what I think about the offheap
efforts :) ).
>>>>>>>>>>> Would like to add the following topics:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> - "Timestamp Resolution". Or making space for
more bits in the
>>>>>>>>>> timestamps (happy to cover that, unless it's part
of the "Time"
>>>> topic)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> - "Replication". We found that replication cannot
keep up with
>> high
>>>>>>>>>> write loads, due to the fact that replicated is strictly
single
>>>>>> threaded
>>>>>>>>>> per regionserver (even though we have multiple region
servers on
>> the
>>>>>>>> sink
>>>>>>>>>> side)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> - "Spark integration" (Ted Malaska?)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> OK... Out now to make a "bullshit hat".
>>>>>>>>>>> 
>>>>>>>>>>> -- Lars
>>>>>>>>>>> 
>>>>>>>>>>> ________________________________
>>>>>>>>>>> From: Sean Busbey <busbey@cloudera.com>
>>>>>>>>>>> To: dev <dev@hbase.apache.org>
>>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
workshop on
>> near-term
>>>>>> work
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I'm planning to be in the Bay area the week of
the 24th of
>> August.
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Sean
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell"
<apurtell@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I can be up in your area in August.
>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM,
Stack <stack@duboce.net>
>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM,
Enis Söztutar <
>>>>>> enis.soz@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Sounds good. It has been a while
we did the talk-aton.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'll be off starting 25 of July,
so I prefer something next
>> week
>>>>>> if
>>>>>>>>>>>>>> possible.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You ever coming back? If so, when?
I'm back on 10th of August
>>>>>>>> (Mikhail
>>>>>>>>>>>> on
>>>>>>>>>>>>> the 20th).
>>>>>>>>>>>>> St.Ack
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Enis
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18
PM, Stack <stack@duboce.net>
>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Matteo and I were thinking it
time devs got together for a
>>>>>> pow-wow.
>>>>>>>>>>>>> There
>>>>>>>>>>>>>>> is a bunch of stuff in flight
at the moment (see below list)
>>>> and
>>>>>> it
>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be good to meet and whiteboard,
surface goodo ideas that have
>>>>>> gone
>>>>>>>>>>>>>> dormant
>>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals
out in JIRA-attached
>>>> google
>>>>>>>> doc
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> need socializing.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> You can only come if you are
wearing your bullshit hat.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Topics we'd go over could include:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> + Our filesystem layout will
not work if 1M regions
>>>>>> (Matteo/Stack)
>>>>>>>>>>>>>>> + Current state of the offheaping
of read path and alternate
>>>>>>>> KeyValue
>>>>>>>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>>>>>>>> + Time (Enis)
>>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>>>>>>>> + hbase-2.0.0
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I put names by folks I know could
talk to the topic. If you
>>>> want
>>>>>> to
>>>>>>>>>>>>> take
>>>>>>>>>>>>>>> over a topic or put your name
by one, just say.  Suggest that
>>>>>>>>>>>>> discussion
>>>>>>>>>>>>>>> lead off with a 5-10minute on
current state of
>>>>>>>>>>>>>>> thought/design/implementation.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What do others think?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What date would suit folks?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Anyone want to host?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Matteo and St.Ack
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> 
>>>>>>>>>>>> - Andy
>>>>>>>>>>>> 
>>>>>>>>>>>> Problems worthy of attack prove their worth
by hitting back. -
>>>> Piet
>>>>>>>> Hein
>>>>>>>>>>>> (via Tom White)
>> 

Mime
View raw message