cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)
Date Sat, 05 Nov 2016 19:07:39 GMT
On Sat, Nov 5, 2016 at 9:19 AM, Benedict Elliott Smith <benedict@apache.org>
wrote:

> Hi Ed,
>
> I would like to try and clear up what I perceive to be some
> misunderstandings.
>
> Aleksey is relating that for *complex* tickets there are desperately few
> people with the expertise necessary to review them.  In some cases it can
> amount to several weeks' work, possibly requiring multiple people, which is
> a huge investment.  EPaxos is an example where its complexity likely needs
> multiple highly qualified reviewers.
>
> Simpler tickets on the other hand languish due to poor incentives - they
> aren't sexy for volunteers, and aren't important for the corporately
> sponsored contributors, who also have finite resources.  Nobody *wants* to
> do them.
>
> This does contribute to an emergent lack of diversity in the pool of
> contributors, but it doesn't discount Aleksey's point.  We need to find a
> way forward that handles both of these concerns.
>
> Sponsored contributors have invested time into efforts to expand the
> committer pool before, though they have universally failed.  Efforts like
> the "low hanging fruit squad" seem like a good idea that might payoff, with
> the only risk being the cloud hanging over the project right now.  I think
> constructive engagement with potential sponsors is probably the way
> forward.
>
> (As an aside, the policy on test coverage was historically very poor
> indeed, but is I believe much stronger today - try not to judge current
> behaviours on those of the past)
>
>
> On 5 November 2016 at 00:05, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
> > "I’m sure users running Cassandra in production would prefer actual
> proper
> > reviews to non-review +1s."
> >
> > Again, you are implying that only you can do a proper job.
> >
> > Lets be specific here: You and I are working on this one:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-10825
> >
> > Now, Ariel reported there was no/low code coverage. I went looking a the
> > code and found a problem.
> >
> > If someone were to merge this: I would have more incentive to look for
> > other things, then I might find more bugs and improvements. If this
> process
> > keeps going, I would naturally get exposed to more of the code. Finally
> in
> > maybe (I don't know in 10 or 20 years) I could become one of these
> > specialists.
> >
> > Lets peal this situation apart:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-10825
> >
> > "If you grep test/src and cassandra-dtest you will find that the string
> > OverloadedException doesn't appear anywhere."
> >
> > Now let me flip this situation around:
> >
> > "I'm sure the users running Cassandra in production would prefer proper
> > coding practice like writing unit and integration test to rubber stamp
> > merges"
> >
> > When the shoe is on the other foot it does not feel so nice.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Nov 4, 2016 at 7:08 PM, Aleksey Yeschenko <aleksey@apache.org>
> > wrote:
> >
> > > Dunno. A sneaky correctness or data corruption bug. A performance
> > > regression. Or something that can take a node/cluster down.
> > >
> > > Of course no process is bullet-proof. The purpose of review is to
> > minimise
> > > the odds of such a thing happening.
> > >
> > > I’m sure users running Cassandra in production would prefer actual
> proper
> > > reviews to non-review +1s.
> > >
> > > --
> > > AY
> > >
> > > On 4 November 2016 at 23:03:23, Edward Capriolo (edlinuxguru@gmail.com
> )
> > > wrote:
> > >
> > > I feel that is really standing up on a soap box. What would be the
> worst
> > > thing that happens here
> >
>

Benedict,

Well said. I think we both see a similar way forward.

"Sponsored contributors have invested time into efforts to expand the
committer pool before, though they have universally failed."

Lets talk about this. I am following a number of tickets. Take for example
this one.

https://issues.apache.org/jira/browse/CASSANDRA-12649

September 19th: User submits a patch along with a clear rational. (It is
right in the description of the ticket):

October 19th: (me) +1 (non binding) users with unpredictable batch sizes
tend to also have gc problems and this would aid in insight.

October 28th: Someone else: Would be nice to see this committed. We have
seen a lot of users mistakenly batch against multiple partitions.

Note: 3 people have agreed they see this as useful.

November 1st
So rebased the patch on 3.X and one of the added unit tests actually
exposed a bug that was just introduced in CASSANDRA-12060
<https://issues.apache.org/jira/browse/CASSANDRA-12060>. Attached new,
rebased patch here, however doubtful it's going to make it into 3.10.

Also raised CASSANDRA-12867
<https://issues.apache.org/jira/browse/CASSANDRA-12867> to cover the bug.

Note: Did a test in this patch uncover something else?

yesterday

I discussed the patch with Aleksey Yeschenko
<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=iamaleksey> and
we both have some concerns with the performance impact of measuring the
mutation size. Could you provide a benchmark to assess the performance
impact of that measurement?

I am also not fully convinced by the usefullness of Logged / Unlogged
Partitions per batch distribution. Could you explain in more details how it
will be usefull for you?

So:
3 people have already explained why this is directly useful to them.
Beyond its direct usefulness now the unit test inside of it would have
could some other issue.

The contributor is asked to assess the performance impact. Maybe I am not
understanding correctly. I have measured the round trip or large batch
mutations around 200 ms. I do not see how a counter update in the
nano-seconds could even be a factor.  Even so, no one has explained how to
construct this benchmark what are we benching cpu/memory/using stress and
counting with and without?

If it were me and I wanted grow out the committer pool, I would micro
bench/smoke test it myself in 10-30 minutes or so. (It not like it is a
feature that you can not pull later anyway). Once you have smoke tested and
it has no relative performance impact, you can just say to yourself
"Whatever, I don't know if I see the point, but its like a 6 line patch,
there are like 1000 other counters anyway, these three other people seem to
think it is useful, and the tests actually add more coverage, so LGTM".

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message