hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: DISCUSSION: lets do a developer workshop on near-term work
Date Mon, 20 Jul 2015 11:49:36 GMT
Can y'all move discussion of the off heaping work (or perf feature dev
generally) to a new thread?

-- 
Sean
On Jul 20, 2015 6:44 AM, "ramkrishna vasudevan" <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Hi Andy
>
> Based on our POCs done, we expect around 20% improvement in latency.  For
> scans it will be little lesser than 20%.
>
> Regards
> Ram
>
>
> On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <andrew.purtell@gmail.com
> >
> wrote:
>
> > Hi Ram,
> >
> > Do you have any targets for what you are measuring? What are the goals
> you
> > guys are working toward with the off heaping changes?
> >
> >
> > > On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > Thanks Vladimir.
> > > Yeah, the reports that were attached specifically captured the 95/99th
> > > percentile.
> > > The reason for checking the server side perf was to specifically see
> the
> > > improvement in the server side and also the client was sending large
> > > results in multiple threads. So wanted to avoid the n/w interference. I
> > > think it was a general practice that we were following.
> > > We Wil do some more tests and get some latest readings with bigger data
> > > sets.
> > > Sent from mobile.
> > >> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <andrew.purtell@gmail.com>
> > wrote:
> > >>
> > >> +1
> > >>
> > >> Yeah, something like that, with aspirational targets for improvement
> > from
> > >> current releases. Then what to measure, the tests to run, and criteria
> > for
> > >> evaluation are clear and organized and we're able to better assess how
> > the
> > >> work in progress is meeting its goals (or not)
> > >>
> > >>
> > >>
> > >> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > >
> > >> wrote:
> > >>
> > >>>>> Umbrella jira to make sure we can have blocks cached in offheap
> > backed
> > >>> cache. In the entire read path, we can refer to this offheap buffer
> and
> > >>> avoid onheap copying.
> > >>>
> > >>> I think, on a read path, the most important improvement we could
> > imagine
> > >> is
> > >>> elimination or reducing of object creations (KVs, iterators etc).
> > >>> object reuse, byte buffers reuse or offheap buffers reuse, API change
> > >> etc.
> > >>> If this is a part of this JIRA, then I would easily define a goal:
> > >>> improving 95/99% latency of a read operations. Not performance, but
> > >> latency
> > >>> matters
> > >>>
> > >>> -Vlad
> > >>>
> > >>>
> > >>>
> > >>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> > >> andrew.purtell@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> That's not a realistic or useful test scenario, unless the goal
is
> to
> > >>>> accelerate queries where all cells are filtered at the server.
> > >>>>
> > >>>>
> > >>>>
> > >>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <anoop.hbase@gmail.com>
> > >> wrote:
> > >>>>>
> > >>>>> No Andy. 11425 having doc attached to it. At the end of it,
we have
> > >> added
> > >>>>> perf numbers in a cluster testing.  This was done using PE
get and
> > scan
> > >>>>> tests with filtering all cells at server (to not consider n/w
> > bandwidth
> > >>>>> constraints)
> > >>>>>
> > >>>>> -Anoop-
> > >>>>>
> > >>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> > >>>> andrew.purtell@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> We have some microbenchmarks, not evidence of differences
seen
> from
> > a
> > >>>>>> client application. I'm not saying that microbenchmarks
are not
> > >> totally
> > >>>>>> necessary and a great start - they are - but that they
don't
> measure
> > >> an
> > >>>> end
> > >>>>>> goal. Furthermore unless I've missed one somewhere we don't
have a
> > >> JIRA
> > >>>> or
> > >>>>>> design doc that states a clear end goal metric like the
strawman I
> > >> threw
> > >>>>>> together in my previous mail. A measurable system level
goal and
> > some
> > >>>> data
> > >>>>>> from full cluster testing would go a lot further toward
letting
> all
> > of
> > >>>> us
> > >>>>>> evaluate the potential and payoff of the work. In the meantime
we
> > >> should
> > >>>>>> probably be assembling these changes on a branch instead
of in
> > trunk,
> > >>>> for
> > >>>>>> as long as the goal is not clearly defined and the payoff
and
> > >> potential
> > >>>> for
> > >>>>>> perf regressions is untested and unknown.
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <anoop.hbase@gmail.com>
> > >> wrote:
> > >>>>>>>
> > >>>>>>> Thanks Andy and Lars.  The parent jira has doc attached
which
> > >> contains
> > >>>>>> some
> > >>>>>>> perf gain numbers..  We will be doing more tests in
next 2 weeks
> > >>>> (before
> > >>>>>>> end of this month) and will publish them.   Yes it
will be great
> if
> > >> it
> > >>>> is
> > >>>>>>> more IST friendly time :-)
> > >>>>>>>
> > >>>>>>> -Anoop-
> > >>>>>>>
> > >>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> > >>>>>> andrew.purtell@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>>> I can represent your side Ram (and Anoop).
I've been known
> always
> > >>>> argue
> > >>>>>>>> both side of a discussion and to never take sides
easily (drives
> > >> some
> > >>>>>> folks
> > >>>>>>>> crazy).
> > >>>>>>>>
> > >>>>>>>> I can vouch for this (smile)
> > >>>>>>>>
> > >>>>>>>> I also can offer support for off heaping there.
At the same time
> > we
> > >> do
> > >>>>>>>> have a gap where we can't point to a timeline of
improvements
> > (yet,
> > >>>>>> anyway)
> > >>>>>>>> with benchmarks showing gains where your goals
need them. For
> > >> example,
> > >>>>>>>> stock HBase in one JVM can address max N GB for
response time
> > >>>>>> distribution
> > >>>>>>>> D; dev version of HBase in off heap branch can
address max N' GB
> > for
> > >>>>>>>> distribution D', where N' > N and D > D'
(distribution D'
> > >>>> statistically
> > >>>>>>>> shows better/lower response times).
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl
<larsh@apache.org>
> > >> wrote:
> > >>>>>>>>>
> > >>>>>>>>> I'm in favor of anything that improves performance
(and
> > preferably
> > >>>>>>>> doesn't set us back into a world that's worse than
C due to the
> > lack
> > >>>> of
> > >>>>>>>> pointers in Java).Never said "I don't like it",
it's just that
> I'm
> > >>>>>> perhaps
> > >>>>>>>> asking for more numbers and justification in weighing
the pros
> and
> > >>>> cons.
> > >>>>>>>>> I can represent your side Ram (and Anoop).
I've been known
> always
> > >>>> argue
> > >>>>>>>> both side of a discussion and to never take sides
easily (drives
> > >> some
> > >>>>>> folks
> > >>>>>>>> crazy). And Stack's there too, he yell at me where
needed :)
> > >>>>>>>>>
> > >>>>>>>>> Perhaps we can do it a bit later in the evening
so there is a
> > >>>> fighting
> > >>>>>>>> chance that folks on IST can participate. I know
that some of
> our
> > >>>> folks
> > >>>>>> on
> > >>>>>>>> IST would love to participate in the backup discussion).
> > >>>>>>>>>
> > >>>>>>>>> Like Enis, I'm also happy to host. We're in
Downtown SF. I'd
> just
> > >>>> need
> > >>>>>>>> an approx. number of folks.
> > >>>>>>>>>
> > >>>>>>>>> -- Lars
> > >>>>>>>>>
> > >>>>>>>>>  From: ramkrishna vasudevan <ramkrishna.s.vasudevan@gmail.com>
> > >>>>>>>>> To: "dev@hbase.apache.org" <dev@hbase.apache.org>;
lars
> > hofhansl <
> > >>>>>>>> larsh@apache.org>
> > >>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> > >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
workshop on
> > near-term
> > >>>> work
> > >>>>>>>>>
> > >>>>>>>>> Hi
> > >>>>>>>>> What time will it be on August 26th?
> > >>>>>>>>> @LarsYa. I know that you are not generally
in favour of this
> > >>>> offheaping
> > >>>>>>>> stuff.  May be if we (from India) can attend this
meeting
> remotely
> > >>>> your
> > >>>>>>>> thoughts can be discussed and also the current
state of this
> work.
> > >>>>>>>>> RegardsRam
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl
<
> larsh@apache.org
> > >
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Works for me. I'll be back in the Bay Area
the week of August
> > 9th.
> > >>>>>>>>> We have done a _lot_ of work on backups as
well - ours are more
> > >>>>>>>> complicated as we wanted fast per-tenant restores,
so data is
> > >>>> "grouped"
> > >>>>>> by
> > >>>>>>>> tenant. Would like to sync up on that (hopefully
some of the
> folks
> > >> who
> > >>>>>>>> wrote most of the code will be in town, I'll check).
> > >>>>>>>>>
> > >>>>>>>>> Also interested in the "Time" and "offheap"
parts (although you
> > >> folks
> > >>>>>>>> usually do not like what I think about the offheap
efforts :) ).
> > >>>>>>>>> Would like to add the following topics:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> - "Timestamp Resolution". Or making space for
more bits in the
> > >>>>>>>> timestamps (happy to cover that, unless it's part
of the "Time"
> > >> topic)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> - "Replication". We found that replication
cannot keep up with
> > high
> > >>>>>>>> write loads, due to the fact that replicated is
strictly single
> > >>>> threaded
> > >>>>>>>> per regionserver (even though we have multiple
region servers on
> > the
> > >>>>>> sink
> > >>>>>>>> side)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> - "Spark integration" (Ted Malaska?)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> OK... Out now to make a "bullshit hat".
> > >>>>>>>>>
> > >>>>>>>>> -- Lars
> > >>>>>>>>>
> > >>>>>>>>> ________________________________
> > >>>>>>>>> From: Sean Busbey <busbey@cloudera.com>
> > >>>>>>>>> To: dev <dev@hbase.apache.org>
> > >>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> > >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
workshop on
> > near-term
> > >>>> work
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I'm planning to be in the Bay area the week
of the 24th of
> > August.
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> Sean
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell"
<
> apurtell@apache.org>
> > >>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> I can be up in your area in August.
> > >>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM,
Stack <stack@duboce.net>
> > >> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM,
Enis Söztutar <
> > >>>> enis.soz@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Sounds good. It has been a while
we did the talk-aton.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'll be off starting 25 of July,
so I prefer something next
> > week
> > >>>> if
> > >>>>>>>>>>>> possible.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> You ever coming back? If so, when?
I'm back on 10th of
> August
> > >>>>>> (Mikhail
> > >>>>>>>>>> on
> > >>>>>>>>>>> the 20th).
> > >>>>>>>>>>> St.Ack
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Enis
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18
PM, Stack <stack@duboce.net>
> > >> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Matteo and I were thinking
it time devs got together for a
> > >>>> pow-wow.
> > >>>>>>>>>>> There
> > >>>>>>>>>>>>> is a bunch of stuff in flight
at the moment (see below
> list)
> > >> and
> > >>>> it
> > >>>>>>>>>>> would
> > >>>>>>>>>>>>> be good to meet and whiteboard,
surface goodo ideas that
> have
> > >>>> gone
> > >>>>>>>>>>>> dormant
> > >>>>>>>>>>>>> in JIRA, or revisit designs/proposals
out in JIRA-attached
> > >> google
> > >>>>>> doc
> > >>>>>>>>>>>> that
> > >>>>>>>>>>>>> need socializing.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> You can only come if you are
wearing your bullshit hat.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Topics we'd go over could include:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> + Our filesystem layout will
not work if 1M regions
> > >>>> (Matteo/Stack)
> > >>>>>>>>>>>>> + Current state of the offheaping
of read path and
> alternate
> > >>>>>> KeyValue
> > >>>>>>>>>>>>> implementation (Anoop/Ram)
> > >>>>>>>>>>>>> + Append rejigger (Elliott)
> > >>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> > >>>>>>>>>>>>> + Splitting meta/1M regions
> > >>>>>>>>>>>>> + The revived Backup (Vladimir)
> > >>>>>>>>>>>>> + Time (Enis)
> > >>>>>>>>>>>>> + The overloaded SequenceId
(Stack)
> > >>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> > >>>>>>>>>>>>> + hbase-2.0.0
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I put names by folks I know
could talk to the topic. If you
> > >> want
> > >>>> to
> > >>>>>>>>>>> take
> > >>>>>>>>>>>>> over a topic or put your name
by one, just say.  Suggest
> that
> > >>>>>>>>>>> discussion
> > >>>>>>>>>>>>> lead off with a 5-10minute
on current state of
> > >>>>>>>>>>>>> thought/design/implementation.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> What do others think?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> What date would suit folks?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Anyone want to host?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> Matteo and St.Ack
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>> Best regards,
> > >>>>>>>>>>
> > >>>>>>>>>> - Andy
> > >>>>>>>>>>
> > >>>>>>>>>> Problems worthy of attack prove their worth
by hitting back. -
> > >> Piet
> > >>>>>> Hein
> > >>>>>>>>>> (via Tom White)
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message