hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Söztutar <e...@apache.org>
Subject Re: DISCUSSION: lets do a developer workshop on near-term work
Date Wed, 12 Aug 2015 08:34:28 GMT
Agreed, too many fat topics, but all important. I guess we can spend first
10-20 mins on the agenda based on who is in the room and come up with a
shorter list and go from there.

Enis

On Tue, Aug 11, 2015 at 9:23 PM, Stack <stack@duboce.net> wrote:

> On Mon, Jul 20, 2015 at 1:04 PM, Stephen Jiang <syuanjiangdev@gmail.com>
> wrote:
>
> > [Let us move back to the main topic - a meeting to talk about the next
> > direction on HBASE development]
> >
> > Are we firm on the *August 26th* meeting date?
> >
> > Given the long list of topics from St.Ack, even a one day meeting might
> > not cover all of them (in depth).  We need to either trim the topic list
> or
> > limit the time to discuss a single topic (30 min for one topic enough?).
> >
> >
> Thanks for bringing us back to topic Stephen.
>
> Yes, lets do 26th. Speak up if this does not suit. I will file a meetup
> page in an hour or so. Where should we do it? Enis offered his nice place.
> Could try and get space at ours too... in Palo Alto (less 'deep south', a
> little easier for the SFers).
>
> As to too many topics, in my experience, a bunch of smelly engineers all in
> a room starts to fall apart after a couple of hours especially when ranging
> discussion. Suggest we cut the time-per-topic and list of topics so can do
> in an afternoon. If some topics are too fat, can do break out or put-off to
> another day and smaller, interested group.
>
> St.Ack
>
>
>
>
> > Thanks
> > Stephen
> >
> >
> > On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <anoop.hbase@gmail.com>
> wrote:
> >
> >> We will be doing some more large data tests in coming week Andy..   Will
> >> report back more.  Also will do a write up , in what all ways the work
> >> might help us.  As Sean said, we will continue in another thread if any
> >> thing further..  Will soon write back on the test result.  Thanks.
> >>
> >> -Anoop-
> >>
> >> On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <
> andrew.purtell@gmail.com
> >> >
> >> wrote:
> >>
> >> > Cool, thanks.
> >> >
> >> > Is a 20% latency reduction the most we can expect or do you think
> there
> >> is
> >> > room for more improvement? Just curious.
> >> >
> >> > Is latency reduction the only goal? Anything here about supporting
> >> larger
> >> > heaps? Is there something we can measure in that regard?
> >> >
> >> > Hope you see my point and there's enough here to prime a goals and
> >> metrics
> >> > discussion at the pow wow or on the relevant JIRAs.
> >> >
> >> > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
> >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > >
> >> > > Hi Andy
> >> > >
> >> > > Based on our POCs done, we expect around 20% improvement in latency.
> >> For
> >> > > scans it will be little lesser than 20%.
> >> > >
> >> > > Regards
> >> > > Ram
> >> > >
> >> > >
> >> > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
> >> > andrew.purtell@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> Hi Ram,
> >> > >>
> >> > >> Do you have any targets for what you are measuring? What are the
> >> goals
> >> > you
> >> > >> guys are working toward with the off heaping changes?
> >> > >>
> >> > >>
> >> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> >> > >>> ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > >>>
> >> > >>> Thanks Vladimir.
> >> > >>> Yeah, the reports that were attached specifically captured
the
> >> 95/99th
> >> > >>> percentile.
> >> > >>> The reason for checking the server side perf was to specifically
> see
> >> > the
> >> > >>> improvement in the server side and also the client was sending
> large
> >> > >>> results in multiple threads. So wanted to avoid the n/w
> >> interference. I
> >> > >>> think it was a general practice that we were following.
> >> > >>> We Wil do some more tests and get some latest readings with
bigger
> >> data
> >> > >>> sets.
> >> > >>> Sent from mobile.
> >> > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <
> >> andrew.purtell@gmail.com>
> >> > >> wrote:
> >> > >>>>
> >> > >>>> +1
> >> > >>>>
> >> > >>>> Yeah, something like that, with aspirational targets for
> >> improvement
> >> > >> from
> >> > >>>> current releases. Then what to measure, the tests to run,
and
> >> criteria
> >> > >> for
> >> > >>>> evaluation are clear and organized and we're able to better
> assess
> >> how
> >> > >> the
> >> > >>>> work in progress is meeting its goals (or not)
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> >> > vladrodionov@gmail.com
> >> > >>>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>>>> Umbrella jira to make sure we can have blocks
cached in
> offheap
> >> > >> backed
> >> > >>>>> cache. In the entire read path, we can refer to this
offheap
> >> buffer
> >> > and
> >> > >>>>> avoid onheap copying.
> >> > >>>>>
> >> > >>>>> I think, on a read path, the most important improvement
we could
> >> > >> imagine
> >> > >>>> is
> >> > >>>>> elimination or reducing of object creations (KVs,
iterators
> etc).
> >> > >>>>> object reuse, byte buffers reuse or offheap buffers
reuse, API
> >> change
> >> > >>>> etc.
> >> > >>>>> If this is a part of this JIRA, then I would easily
define a
> goal:
> >> > >>>>> improving 95/99% latency of a read operations. Not
performance,
> >> but
> >> > >>>> latency
> >> > >>>>> matters
> >> > >>>>>
> >> > >>>>> -Vlad
> >> > >>>>>
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> >> > >>>> andrew.purtell@gmail.com>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>>> That's not a realistic or useful test scenario,
unless the goal
> >> is
> >> > to
> >> > >>>>>> accelerate queries where all cells are filtered
at the server.
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <
> anoop.hbase@gmail.com
> >> >
> >> > >>>> wrote:
> >> > >>>>>>>
> >> > >>>>>>> No Andy. 11425 having doc attached to it.
At the end of it, we
> >> have
> >> > >>>> added
> >> > >>>>>>> perf numbers in a cluster testing.  This was
done using PE get
> >> and
> >> > >> scan
> >> > >>>>>>> tests with filtering all cells at server (to
not consider n/w
> >> > >> bandwidth
> >> > >>>>>>> constraints)
> >> > >>>>>>>
> >> > >>>>>>> -Anoop-
> >> > >>>>>>>
> >> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell
<
> >> > >>>>>> andrew.purtell@gmail.com>
> >> > >>>>>>> wrote:
> >> > >>>>>>>
> >> > >>>>>>>> We have some microbenchmarks, not evidence
of differences
> seen
> >> > from
> >> > >> a
> >> > >>>>>>>> client application. I'm not saying that
microbenchmarks are
> not
> >> > >>>> totally
> >> > >>>>>>>> necessary and a great start - they are
- but that they don't
> >> > measure
> >> > >>>> an
> >> > >>>>>> end
> >> > >>>>>>>> goal. Furthermore unless I've missed one
somewhere we don't
> >> have a
> >> > >>>> JIRA
> >> > >>>>>> or
> >> > >>>>>>>> design doc that states a clear end goal
metric like the
> >> strawman I
> >> > >>>> threw
> >> > >>>>>>>> together in my previous mail. A measurable
system level goal
> >> and
> >> > >> some
> >> > >>>>>> data
> >> > >>>>>>>> from full cluster testing would go a lot
further toward
> letting
> >> > all
> >> > >> of
> >> > >>>>>> us
> >> > >>>>>>>> evaluate the potential and payoff of the
work. In the
> meantime
> >> we
> >> > >>>> should
> >> > >>>>>>>> probably be assembling these changes on
a branch instead of
> in
> >> > >> trunk,
> >> > >>>>>> for
> >> > >>>>>>>> as long as the goal is not clearly defined
and the payoff and
> >> > >>>> potential
> >> > >>>>>> for
> >> > >>>>>>>> perf regressions is untested and unknown.
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop
John <
> >> anoop.hbase@gmail.com>
> >> > >>>> wrote:
> >> > >>>>>>>>>
> >> > >>>>>>>>> Thanks Andy and Lars.  The parent
jira has doc attached
> which
> >> > >>>> contains
> >> > >>>>>>>> some
> >> > >>>>>>>>> perf gain numbers..  We will be doing
more tests in next 2
> >> weeks
> >> > >>>>>> (before
> >> > >>>>>>>>> end of this month) and will publish
them.   Yes it will be
> >> great
> >> > if
> >> > >>>> it
> >> > >>>>>> is
> >> > >>>>>>>>> more IST friendly time :-)
> >> > >>>>>>>>>
> >> > >>>>>>>>> -Anoop-
> >> > >>>>>>>>>
> >> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew
Purtell <
> >> > >>>>>>>> andrew.purtell@gmail.com>
> >> > >>>>>>>>> wrote:
> >> > >>>>>>>>>
> >> > >>>>>>>>>>> I can represent your side
Ram (and Anoop). I've been known
> >> > always
> >> > >>>>>> argue
> >> > >>>>>>>>>> both side of a discussion and
to never take sides easily
> >> (drives
> >> > >>>> some
> >> > >>>>>>>> folks
> >> > >>>>>>>>>> crazy).
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> I can vouch for this (smile)
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> I also can offer support for off
heaping there. At the same
> >> time
> >> > >> we
> >> > >>>> do
> >> > >>>>>>>>>> have a gap where we can't point
to a timeline of
> improvements
> >> > >> (yet,
> >> > >>>>>>>> anyway)
> >> > >>>>>>>>>> with benchmarks showing gains
where your goals need them.
> For
> >> > >>>> example,
> >> > >>>>>>>>>> stock HBase in one JVM can address
max N GB for response
> time
> >> > >>>>>>>> distribution
> >> > >>>>>>>>>> D; dev version of HBase in off
heap branch can address max
> >> N' GB
> >> > >> for
> >> > >>>>>>>>>> distribution D', where N' >
N and D > D' (distribution D'
> >> > >>>>>> statistically
> >> > >>>>>>>>>> shows better/lower response times).
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM,
lars hofhansl <
> >> larsh@apache.org>
> >> > >>>> wrote:
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> I'm in favor of anything that
improves performance (and
> >> > >> preferably
> >> > >>>>>>>>>> doesn't set us back into a world
that's worse than C due to
> >> the
> >> > >> lack
> >> > >>>>>> of
> >> > >>>>>>>>>> pointers in Java).Never said "I
don't like it", it's just
> >> that
> >> > I'm
> >> > >>>>>>>> perhaps
> >> > >>>>>>>>>> asking for more numbers and justification
in weighing the
> >> pros
> >> > and
> >> > >>>>>> cons.
> >> > >>>>>>>>>>> I can represent your side
Ram (and Anoop). I've been known
> >> > always
> >> > >>>>>> argue
> >> > >>>>>>>>>> both side of a discussion and
to never take sides easily
> >> (drives
> >> > >>>> some
> >> > >>>>>>>> folks
> >> > >>>>>>>>>> crazy). And Stack's there too,
he yell at me where needed
> :)
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Perhaps we can do it a bit
later in the evening so there
> is
> >> a
> >> > >>>>>> fighting
> >> > >>>>>>>>>> chance that folks on IST can participate.
I know that some
> of
> >> > our
> >> > >>>>>> folks
> >> > >>>>>>>> on
> >> > >>>>>>>>>> IST would love to participate
in the backup discussion).
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Like Enis, I'm also happy
to host. We're in Downtown SF.
> I'd
> >> > just
> >> > >>>>>> need
> >> > >>>>>>>>>> an approx. number of folks.
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> -- Lars
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> From: ramkrishna vasudevan
<
> >> ramkrishna.s.vasudevan@gmail.com>
> >> > >>>>>>>>>>> To: "dev@hbase.apache.org"
<dev@hbase.apache.org>; lars
> >> > >> hofhansl <
> >> > >>>>>>>>>> larsh@apache.org>
> >> > >>>>>>>>>>> Sent: Wednesday, July 15,
2015 10:10 AM
> >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets
do a developer workshop on
> >> > >> near-term
> >> > >>>>>> work
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Hi
> >> > >>>>>>>>>>> What time will it be on August
26th?
> >> > >>>>>>>>>>> @LarsYa. I know that you are
not generally in favour of
> this
> >> > >>>>>> offheaping
> >> > >>>>>>>>>> stuff.  May be if we (from India)
can attend this meeting
> >> > remotely
> >> > >>>>>> your
> >> > >>>>>>>>>> thoughts can be discussed and
also the current state of
> this
> >> > work.
> >> > >>>>>>>>>>> RegardsRam
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28
PM, lars hofhansl <
> >> > larsh@apache.org
> >> > >>>
> >> > >>>>>>>> wrote:
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Works for me. I'll be back
in the Bay Area the week of
> >> August
> >> > >> 9th.
> >> > >>>>>>>>>>> We have done a _lot_ of work
on backups as well - ours are
> >> more
> >> > >>>>>>>>>> complicated as we wanted fast
per-tenant restores, so data
> is
> >> > >>>>>> "grouped"
> >> > >>>>>>>> by
> >> > >>>>>>>>>> tenant. Would like to sync up
on that (hopefully some of
> the
> >> > folks
> >> > >>>> who
> >> > >>>>>>>>>> wrote most of the code will be
in town, I'll check).
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Also interested in the "Time"
and "offheap" parts
> (although
> >> you
> >> > >>>> folks
> >> > >>>>>>>>>> usually do not like what I think
about the offheap efforts
> >> :) ).
> >> > >>>>>>>>>>> Would like to add the following
topics:
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> - "Timestamp Resolution".
Or making space for more bits in
> >> the
> >> > >>>>>>>>>> timestamps (happy to cover that,
unless it's part of the
> >> "Time"
> >> > >>>> topic)
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> - "Replication". We found
that replication cannot keep up
> >> with
> >> > >> high
> >> > >>>>>>>>>> write loads, due to the fact that
replicated is strictly
> >> single
> >> > >>>>>> threaded
> >> > >>>>>>>>>> per regionserver (even though
we have multiple region
> >> servers on
> >> > >> the
> >> > >>>>>>>> sink
> >> > >>>>>>>>>> side)
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> - "Spark integration" (Ted
Malaska?)
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> OK... Out now to make a "bullshit
hat".
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> -- Lars
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> ________________________________
> >> > >>>>>>>>>>> From: Sean Busbey <busbey@cloudera.com>
> >> > >>>>>>>>>>> To: dev <dev@hbase.apache.org>
> >> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015
7:11 PM
> >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets
do a developer workshop on
> >> > >> near-term
> >> > >>>>>> work
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> I'm planning to be in the
Bay area the week of the 24th of
> >> > >> August.
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> --
> >> > >>>>>>>>>>> Sean
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM,
"Andrew Purtell" <
> >> > apurtell@apache.org>
> >> > >>>>>>>> wrote:
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>> I can be up in your area
in August.
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> On Tue, Jul 14,
2015 at 5:31 PM, Stack <
> stack@duboce.net
> >> >
> >> > >>>> wrote:
> >> > >>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> On Tue, Jul 14,
2015 at 3:39 PM, Enis Söztutar <
> >> > >>>>>> enis.soz@gmail.com>
> >> > >>>>>>>>>>>>> wrote:
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> Sounds good. It
has been a while we did the talk-aton.
> >> > >>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> I'll be off starting
25 of July, so I prefer something
> >> next
> >> > >> week
> >> > >>>>>> if
> >> > >>>>>>>>>>>>>> possible.
> >> > >>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> You ever coming
back? If so, when? I'm back on 10th of
> >> > August
> >> > >>>>>>>> (Mikhail
> >> > >>>>>>>>>>>> on
> >> > >>>>>>>>>>>>> the 20th).
> >> > >>>>>>>>>>>>> St.Ack
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> Enis
> >> > >>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> On Tue, Jul
14, 2015 at 3:18 PM, Stack <
> >> stack@duboce.net>
> >> > >>>> wrote:
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> Matteo and
I were thinking it time devs got together
> >> for a
> >> > >>>>>> pow-wow.
> >> > >>>>>>>>>>>>> There
> >> > >>>>>>>>>>>>>>> is a bunch
of stuff in flight at the moment (see below
> >> > list)
> >> > >>>> and
> >> > >>>>>> it
> >> > >>>>>>>>>>>>> would
> >> > >>>>>>>>>>>>>>> be good to
meet and whiteboard, surface goodo ideas
> that
> >> > have
> >> > >>>>>> gone
> >> > >>>>>>>>>>>>>> dormant
> >> > >>>>>>>>>>>>>>> in JIRA, or
revisit designs/proposals out in
> >> JIRA-attached
> >> > >>>> google
> >> > >>>>>>>> doc
> >> > >>>>>>>>>>>>>> that
> >> > >>>>>>>>>>>>>>> need socializing.
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> You can only
come if you are wearing your bullshit
> hat.
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> Topics we'd
go over could include:
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> + Our filesystem
layout will not work if 1M regions
> >> > >>>>>> (Matteo/Stack)
> >> > >>>>>>>>>>>>>>> + Current
state of the offheaping of read path and
> >> > alternate
> >> > >>>>>>>> KeyValue
> >> > >>>>>>>>>>>>>>> implementation
(Anoop/Ram)
> >> > >>>>>>>>>>>>>>> + Append rejigger
(Elliott)
> >> > >>>>>>>>>>>>>>> + A Pv2-based
Assign (Matteo/Steven)
> >> > >>>>>>>>>>>>>>> + Splitting
meta/1M regions
> >> > >>>>>>>>>>>>>>> + The revived
Backup (Vladimir)
> >> > >>>>>>>>>>>>>>> + Time (Enis)
> >> > >>>>>>>>>>>>>>> + The overloaded
SequenceId (Stack)
> >> > >>>>>>>>>>>>>>> + Upstreaming
IT testing (Dima/Sean)
> >> > >>>>>>>>>>>>>>> + hbase-2.0.0
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> I put names
by folks I know could talk to the topic.
> If
> >> you
> >> > >>>> want
> >> > >>>>>> to
> >> > >>>>>>>>>>>>> take
> >> > >>>>>>>>>>>>>>> over a topic
or put your name by one, just say.
> Suggest
> >> > that
> >> > >>>>>>>>>>>>> discussion
> >> > >>>>>>>>>>>>>>> lead off with
a 5-10minute on current state of
> >> > >>>>>>>>>>>>>>> thought/design/implementation.
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> What do others
think?
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> What date
would suit folks?
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> Anyone want
to host?
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> Thanks,
> >> > >>>>>>>>>>>>>>> Matteo and
St.Ack
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>> --
> >> > >>>>>>>>>>>> Best regards,
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>> - Andy
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>> Problems worthy of attack
prove their worth by hitting
> >> back. -
> >> > >>>> Piet
> >> > >>>>>>>> Hein
> >> > >>>>>>>>>>>> (via Tom White)
> >> > >>
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message