hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: DISCUSSION: lets do a developer workshop on near-term work
Date Mon, 20 Jul 2015 16:50:47 GMT
We will be doing some more large data tests in coming week Andy..   Will
report back more.  Also will do a write up , in what all ways the work
might help us.  As Sean said, we will continue in another thread if any
thing further..  Will soon write back on the test result.  Thanks.

-Anoop-

On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <andrew.purtell@gmail.com>
wrote:

> Cool, thanks.
>
> Is a 20% latency reduction the most we can expect or do you think there is
> room for more improvement? Just curious.
>
> Is latency reduction the only goal? Anything here about supporting larger
> heaps? Is there something we can measure in that regard?
>
> Hope you see my point and there's enough here to prime a goals and metrics
> discussion at the pow wow or on the relevant JIRAs.
>
> > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > Hi Andy
> >
> > Based on our POCs done, we expect around 20% improvement in latency.  For
> > scans it will be little lesser than 20%.
> >
> > Regards
> > Ram
> >
> >
> > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
> andrew.purtell@gmail.com>
> > wrote:
> >
> >> Hi Ram,
> >>
> >> Do you have any targets for what you are measuring? What are the goals
> you
> >> guys are working toward with the off heaping changes?
> >>
> >>
> >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> >>> ramkrishna.s.vasudevan@gmail.com> wrote:
> >>>
> >>> Thanks Vladimir.
> >>> Yeah, the reports that were attached specifically captured the 95/99th
> >>> percentile.
> >>> The reason for checking the server side perf was to specifically see
> the
> >>> improvement in the server side and also the client was sending large
> >>> results in multiple threads. So wanted to avoid the n/w interference. I
> >>> think it was a general practice that we were following.
> >>> We Wil do some more tests and get some latest readings with bigger data
> >>> sets.
> >>> Sent from mobile.
> >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <andrew.purtell@gmail.com>
> >> wrote:
> >>>>
> >>>> +1
> >>>>
> >>>> Yeah, something like that, with aspirational targets for improvement
> >> from
> >>>> current releases. Then what to measure, the tests to run, and criteria
> >> for
> >>>> evaluation are clear and organized and we're able to better assess how
> >> the
> >>>> work in progress is meeting its goals (or not)
> >>>>
> >>>>
> >>>>
> >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> >>>
> >>>> wrote:
> >>>>
> >>>>>>> Umbrella jira to make sure we can have blocks cached in
offheap
> >> backed
> >>>>> cache. In the entire read path, we can refer to this offheap buffer
> and
> >>>>> avoid onheap copying.
> >>>>>
> >>>>> I think, on a read path, the most important improvement we could
> >> imagine
> >>>> is
> >>>>> elimination or reducing of object creations (KVs, iterators etc).
> >>>>> object reuse, byte buffers reuse or offheap buffers reuse, API change
> >>>> etc.
> >>>>> If this is a part of this JIRA, then I would easily define a goal:
> >>>>> improving 95/99% latency of a read operations. Not performance,
but
> >>>> latency
> >>>>> matters
> >>>>>
> >>>>> -Vlad
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> >>>> andrew.purtell@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> That's not a realistic or useful test scenario, unless the goal
is
> to
> >>>>>> accelerate queries where all cells are filtered at the server.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <anoop.hbase@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>> No Andy. 11425 having doc attached to it. At the end of
it, we have
> >>>> added
> >>>>>>> perf numbers in a cluster testing.  This was done using
PE get and
> >> scan
> >>>>>>> tests with filtering all cells at server (to not consider
n/w
> >> bandwidth
> >>>>>>> constraints)
> >>>>>>>
> >>>>>>> -Anoop-
> >>>>>>>
> >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> >>>>>> andrew.purtell@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> We have some microbenchmarks, not evidence of differences
seen
> from
> >> a
> >>>>>>>> client application. I'm not saying that microbenchmarks
are not
> >>>> totally
> >>>>>>>> necessary and a great start - they are - but that they
don't
> measure
> >>>> an
> >>>>>> end
> >>>>>>>> goal. Furthermore unless I've missed one somewhere we
don't have a
> >>>> JIRA
> >>>>>> or
> >>>>>>>> design doc that states a clear end goal metric like
the strawman I
> >>>> threw
> >>>>>>>> together in my previous mail. A measurable system level
goal and
> >> some
> >>>>>> data
> >>>>>>>> from full cluster testing would go a lot further toward
letting
> all
> >> of
> >>>>>> us
> >>>>>>>> evaluate the potential and payoff of the work. In the
meantime we
> >>>> should
> >>>>>>>> probably be assembling these changes on a branch instead
of in
> >> trunk,
> >>>>>> for
> >>>>>>>> as long as the goal is not clearly defined and the payoff
and
> >>>> potential
> >>>>>> for
> >>>>>>>> perf regressions is untested and unknown.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <anoop.hbase@gmail.com>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached
which
> >>>> contains
> >>>>>>>> some
> >>>>>>>>> perf gain numbers..  We will be doing more tests
in next 2 weeks
> >>>>>> (before
> >>>>>>>>> end of this month) and will publish them.   Yes
it will be great
> if
> >>>> it
> >>>>>> is
> >>>>>>>>> more IST friendly time :-)
> >>>>>>>>>
> >>>>>>>>> -Anoop-
> >>>>>>>>>
> >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell
<
> >>>>>>>> andrew.purtell@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>> I can represent your side Ram (and Anoop).
I've been known
> always
> >>>>>> argue
> >>>>>>>>>> both side of a discussion and to never take
sides easily (drives
> >>>> some
> >>>>>>>> folks
> >>>>>>>>>> crazy).
> >>>>>>>>>>
> >>>>>>>>>> I can vouch for this (smile)
> >>>>>>>>>>
> >>>>>>>>>> I also can offer support for off heaping there.
At the same time
> >> we
> >>>> do
> >>>>>>>>>> have a gap where we can't point to a timeline
of improvements
> >> (yet,
> >>>>>>>> anyway)
> >>>>>>>>>> with benchmarks showing gains where your goals
need them. For
> >>>> example,
> >>>>>>>>>> stock HBase in one JVM can address max N GB
for response time
> >>>>>>>> distribution
> >>>>>>>>>> D; dev version of HBase in off heap branch can
address max N' GB
> >> for
> >>>>>>>>>> distribution D', where N' > N and D >
D' (distribution D'
> >>>>>> statistically
> >>>>>>>>>> shows better/lower response times).
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl
<larsh@apache.org>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I'm in favor of anything that improves performance
(and
> >> preferably
> >>>>>>>>>> doesn't set us back into a world that's worse
than C due to the
> >> lack
> >>>>>> of
> >>>>>>>>>> pointers in Java).Never said "I don't like it",
it's just that
> I'm
> >>>>>>>> perhaps
> >>>>>>>>>> asking for more numbers and justification in
weighing the pros
> and
> >>>>>> cons.
> >>>>>>>>>>> I can represent your side Ram (and Anoop).
I've been known
> always
> >>>>>> argue
> >>>>>>>>>> both side of a discussion and to never take
sides easily (drives
> >>>> some
> >>>>>>>> folks
> >>>>>>>>>> crazy). And Stack's there too, he yell at me
where needed :)
> >>>>>>>>>>>
> >>>>>>>>>>> Perhaps we can do it a bit later in the
evening so there is a
> >>>>>> fighting
> >>>>>>>>>> chance that folks on IST can participate. I
know that some of
> our
> >>>>>> folks
> >>>>>>>> on
> >>>>>>>>>> IST would love to participate in the backup
discussion).
> >>>>>>>>>>>
> >>>>>>>>>>> Like Enis, I'm also happy to host. We're
in Downtown SF. I'd
> just
> >>>>>> need
> >>>>>>>>>> an approx. number of folks.
> >>>>>>>>>>>
> >>>>>>>>>>> -- Lars
> >>>>>>>>>>>
> >>>>>>>>>>> From: ramkrishna vasudevan <ramkrishna.s.vasudevan@gmail.com>
> >>>>>>>>>>> To: "dev@hbase.apache.org" <dev@hbase.apache.org>;
lars
> >> hofhansl <
> >>>>>>>>>> larsh@apache.org>
> >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
workshop on
> >> near-term
> >>>>>> work
> >>>>>>>>>>>
> >>>>>>>>>>> Hi
> >>>>>>>>>>> What time will it be on August 26th?
> >>>>>>>>>>> @LarsYa. I know that you are not generally
in favour of this
> >>>>>> offheaping
> >>>>>>>>>> stuff.  May be if we (from India) can attend
this meeting
> remotely
> >>>>>> your
> >>>>>>>>>> thoughts can be discussed and also the current
state of this
> work.
> >>>>>>>>>>> RegardsRam
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl
<
> larsh@apache.org
> >>>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Works for me. I'll be back in the Bay Area
the week of August
> >> 9th.
> >>>>>>>>>>> We have done a _lot_ of work on backups
as well - ours are more
> >>>>>>>>>> complicated as we wanted fast per-tenant restores,
so data is
> >>>>>> "grouped"
> >>>>>>>> by
> >>>>>>>>>> tenant. Would like to sync up on that (hopefully
some of the
> folks
> >>>> who
> >>>>>>>>>> wrote most of the code will be in town, I'll
check).
> >>>>>>>>>>>
> >>>>>>>>>>> Also interested in the "Time" and "offheap"
parts (although you
> >>>> folks
> >>>>>>>>>> usually do not like what I think about the offheap
efforts :) ).
> >>>>>>>>>>> Would like to add the following topics:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> - "Timestamp Resolution". Or making space
for more bits in the
> >>>>>>>>>> timestamps (happy to cover that, unless it's
part of the "Time"
> >>>> topic)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> - "Replication". We found that replication
cannot keep up with
> >> high
> >>>>>>>>>> write loads, due to the fact that replicated
is strictly single
> >>>>>> threaded
> >>>>>>>>>> per regionserver (even though we have multiple
region servers on
> >> the
> >>>>>>>> sink
> >>>>>>>>>> side)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> - "Spark integration" (Ted Malaska?)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> OK... Out now to make a "bullshit hat".
> >>>>>>>>>>>
> >>>>>>>>>>> -- Lars
> >>>>>>>>>>>
> >>>>>>>>>>> ________________________________
> >>>>>>>>>>> From: Sean Busbey <busbey@cloudera.com>
> >>>>>>>>>>> To: dev <dev@hbase.apache.org>
> >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
workshop on
> >> near-term
> >>>>>> work
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I'm planning to be in the Bay area the week
of the 24th of
> >> August.
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Sean
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell"
<
> apurtell@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I can be up in your area in August.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31
PM, Stack <stack@duboce.net>
> >>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39
PM, Enis Söztutar <
> >>>>>> enis.soz@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sounds good. It has been a while
we did the talk-aton.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'll be off starting 25 of July,
so I prefer something next
> >> week
> >>>>>> if
> >>>>>>>>>>>>>> possible.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> You ever coming back? If so,
when? I'm back on 10th of
> August
> >>>>>>>> (Mikhail
> >>>>>>>>>>>> on
> >>>>>>>>>>>>> the 20th).
> >>>>>>>>>>>>> St.Ack
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Enis
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at
3:18 PM, Stack <stack@duboce.net>
> >>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Matteo and I were thinking
it time devs got together for a
> >>>>>> pow-wow.
> >>>>>>>>>>>>> There
> >>>>>>>>>>>>>>> is a bunch of stuff in flight
at the moment (see below
> list)
> >>>> and
> >>>>>> it
> >>>>>>>>>>>>> would
> >>>>>>>>>>>>>>> be good to meet and whiteboard,
surface goodo ideas that
> have
> >>>>>> gone
> >>>>>>>>>>>>>> dormant
> >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals
out in JIRA-attached
> >>>> google
> >>>>>>>> doc
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> need socializing.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> You can only come if you
are wearing your bullshit hat.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Topics we'd go over could
include:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> + Our filesystem layout
will not work if 1M regions
> >>>>>> (Matteo/Stack)
> >>>>>>>>>>>>>>> + Current state of the offheaping
of read path and
> alternate
> >>>>>>>> KeyValue
> >>>>>>>>>>>>>>> implementation (Anoop/Ram)
> >>>>>>>>>>>>>>> + Append rejigger (Elliott)
> >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> >>>>>>>>>>>>>>> + Splitting meta/1M regions
> >>>>>>>>>>>>>>> + The revived Backup (Vladimir)
> >>>>>>>>>>>>>>> + Time (Enis)
> >>>>>>>>>>>>>>> + The overloaded SequenceId
(Stack)
> >>>>>>>>>>>>>>> + Upstreaming IT testing
(Dima/Sean)
> >>>>>>>>>>>>>>> + hbase-2.0.0
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I put names by folks I know
could talk to the topic. If you
> >>>> want
> >>>>>> to
> >>>>>>>>>>>>> take
> >>>>>>>>>>>>>>> over a topic or put your
name by one, just say.  Suggest
> that
> >>>>>>>>>>>>> discussion
> >>>>>>>>>>>>>>> lead off with a 5-10minute
on current state of
> >>>>>>>>>>>>>>> thought/design/implementation.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What do others think?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What date would suit folks?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Anyone want to host?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Matteo and St.Ack
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Andy
> >>>>>>>>>>>>
> >>>>>>>>>>>> Problems worthy of attack prove their
worth by hitting back. -
> >>>> Piet
> >>>>>>>> Hein
> >>>>>>>>>>>> (via Tom White)
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message