cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sankalp Kohli <kohlisank...@gmail.com>
Subject Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)
Date Sun, 20 Nov 2016 17:25:02 GMT
This was not for the Dev list :)

> On Nov 20, 2016, at 09:06, Sankalp Kohli <kohlisankalp@gmail.com> wrote:
> 
> I have asked him to calm down as these things are never constructive for the community.
Making personal comments put him in bad light more than anytime else. 
> I will speak with him in person when we are in office.
> 
> Thanks for keeping an eye on these things for us. I will setup another meeting with you
to talk about Cassandra strategies. 
> 
>> On Nov 20, 2016, at 06:50, Jason Brown <jasedbrown@gmail.com> wrote:
>> 
>> Hey all,
>> 
>> One of the goals on my team, when working on large patches, is to get
>> community feedback on these initiatives before throwing them into prod.
>> This gets us a wider net of feedback (see Sylvain's continuing excellent
>> rounds of feedback to my work on CASSANDRA-8457), as well as making sure we
>> don't go too far off the deep end in terms of straying from the community
>> version. The latter point is crucial because if we make too many
>> incompatible changes to, for example, the internode messaging protocol or
>> the CQL protocol or the sstable file format, and deploy that, it may be
>> very difficult, if not impossible, to rectify with future, in-development
>> versions of cassandra.
>> 
>> We fully intend to "engineer and test the snot out of" the changes we are
>> working on as the whole point of us working on them is so we *can* run them
>> in production, at our scale. We aren't expecting others in the community to
>> dog food it for us. There will be a delay between committing something
>> upstream, and us backporting it to a current version we run in production
>> and actually deploying it. However, you can be sure that any bugs we find
>> will be fixed ASAP; we have many users counting on it.
>> 
>> Thanks for listening,
>> 
>> -Jason
>> 
>> 
>> On Sat, Nov 19, 2016 at 11:04 AM, Blake Eggleston <beggleston@apple.com>
>> wrote:
>> 
>>> I think Ed's just using gossip 2.0 as a hypothetical example. His point is
>>> that we should only commit things when we have a high degree of confidence
>>> that they work correctly, not with the expectation that they don't.
>>> 
>>> 
>>> On November 19, 2016 at 10:52:38 AM, Michael Kjellman (
>>> mkjellman@internalcircle.com) wrote:
>>> 
>>> Jason has asked for review and feedback many times. Maybe be constructive
>>> and review his code instead of just complaining (once again)?
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Nov 19, 2016, at 1:49 PM, Edward Capriolo <edlinuxguru@gmail.com>
>>> wrote:
>>>> 
>>>> I would say start with a mindset like 'people will run this in
>>> production'
>>>> not like 'why would you expect this to work'.
>>>> 
>>>> Now how does this logic effect feature develement? Maybe use gossip 2.0
>>> as
>>>> an example.
>>>> 
>>>> I will play my given debby downer role. I could imagine 1 or 2 dtests and
>>>> the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes
>>> with
>>>> twitter announce of the release let bugs trickle in.
>>>> 
>>>> One could also do something comprehensive like test on clusters of 2 to
>>>> 1000 nodes. Test with jepsen to see what happens during partitions,
>>> inject
>>>> things like jvm pauses and account for behaivor. Log convergence times
>>>> after given events.
>>>> 
>>>> Take a stand and say look "we engineered and beat the crap out of this
>>>> feature. I deployed this release feature at my company and eat my
>>> dogfood.
>>>> You are not my crash test dummy."
>>>> 
>>>> 
>>>>> On Saturday, November 19, 2016, Jeff Jirsa <jjirsa@gmail.com> wrote:
>>>>> 
>>>>> Any proposal to solve the problem you describe?
>>>>> 
>>>>> --
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>>> On Nov 19, 2016, at 8:50 AM, Edward Capriolo <edlinuxguru@gmail.com
>>>>> <;>> wrote:
>>>>>> 
>>>>>> This is especially relevant if people wish to focus on removing things.
>>>>>> 
>>>>>> For example, gossip 2.0 sounds great, but seems geared toward huge
>>>>> clusters
>>>>>> which is not likely a majority of users. For those with a 20 node
>>> cluster
>>>>>> are the indirect benefits woth it?
>>>>>> 
>>>>>> Also there seems to be a first push to remove things like compact
>>> storage
>>>>>> or thrift. Fine great. But what is the realistic update path for
>>> someone.
>>>>>> If the big players are running 2.1 and maintaining backports, the
>>> average
>>>>>> shop without a dedicated team is going to be stuck saying (great
>>> features
>>>>>> in 4.0 that improve performance, i would probably switch but its
not
>>>>> stable
>>>>>> and we have that one compact storage cf and who knows what is going
to
>>>>>> happen performance wise when)
>>>>>> 
>>>>>> We really need to lose this realease wont be stable for 6 minor
>>> versions
>>>>>> concept.
>>>>>> 
>>>>>> On Saturday, November 19, 2016, Edward Capriolo <edlinuxguru@gmail.com
>>>>> <;>>
>>>>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Friday, November 18, 2016, Jeff Jirsa <jeff.jirsa@crowdstrike.com
>>>>> <;>
>>>>>>> <_e(%7B%7D,'cvml','jeff.jirsa@crowdstrike.com <;>');>>
>>>>> wrote:
>>>>>>> 
>>>>>>>> We should assume that we’re ditching tick/tock. I’ll
post a thread on
>>>>>>>> 4.0-and-beyond here in a few minutes.
>>>>>>>> 
>>>>>>>> The advantage of a prod release every 6 months is fewer incentive
to
>>>>> push
>>>>>>>> unfinished work into a release.
>>>>>>>> The disadvantage of a prod release every 6 months is then
we either
>>>>> have
>>>>>>>> a very short lifespan per-release, or we have to maintain
lots of
>>>>> active
>>>>>>>> releases.
>>>>>>>> 
>>>>>>>> 2.1 has been out for over 2 years, and a lot of people (including
us)
>>>>> are
>>>>>>>> running it in prod – if we have a release every 6 months,
that means
>>>>> we’d
>>>>>>>> be supporting 4+ releases at a time, just to keep parity
with what we
>>>>> have
>>>>>>>> now? Maybe that’s ok, if we’re very selective about ‘support’
for 2+
>>>>> year
>>>>>>>> old branches.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 11/18/16, 3:10 PM, "beggleston@apple.com <;> on
behalf
>>>>> of Blake
>>>>>>>> Eggleston" <beggleston@apple.com <;>> wrote:
>>>>>>>> 
>>>>>>>>>> While stability is important if we push back large
"core" changes
>>>>>>>> until later we're just setting ourselves up to face the same
issues
>>>>> later on
>>>>>>>>> 
>>>>>>>>> In theory, yes. In practice, when incomplete features
are earmarked
>>>>> for
>>>>>>>> a certain release, those features are often rushed out, and
not
>>> always
>>>>>>>> fully baked.
>>>>>>>>> 
>>>>>>>>> In any case, I don’t think it makes sense to spend
too much time
>>>>>>>> planning what goes into 4.0, and what goes into the next
major
>>> release
>>>>> with
>>>>>>>> so many release strategy related decisions still up in the
air. Are
>>> we
>>>>>>>> going to ditch tick-tock? If so, what will it’s replacement
look
>>> like?
>>>>>>>> Specifically, when will the next “production” release
happen? Without
>>>>>>>> knowing that, it's hard to say if something should go in
4.0, or 4.5,
>>>>> or
>>>>>>>> 5.0, or whatever.
>>>>>>>>> 
>>>>>>>>> The reason I suggested a production release every 6 months
is
>>> because
>>>>>>>> (in my mind) it’s frequent enough that people won’t be
tempted to
>>> rush
>>>>>>>> features to hit a given release, but not so frequent that
it’s not
>>>>>>>> practical to support. It wouldn’t be the end of the world
if some of
>>>>> these
>>>>>>>> tickets didn’t make it into 4.0, because 4.5 would fine.
>>>>>>>>> 
>>>>>>>>> On November 18, 2016 at 1:57:21 PM, kurt Greaves (
>>>>> kurt@instaclustr.com <;>)
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> On 18 November 2016 at 18:25, Jason Brown <jasedbrown@gmail.com
>>>>> <;>> wrote:
>>>>>>>>>> 
>>>>>>>>>> #11559 (enhanced node representation) - decided it's
*not*
>>> something
>>>>> we
>>>>>>>>>> need wrt #7544 storage port configurable per node,
so we are
>>> punting
>>>>> on
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> #12344 - Forward writes to replacement node with same
address during
>>>>>>>> replace
>>>>>>>>> depends on #11559. To be honest I'd say #12344 is pretty
important,
>>>>>>>>> otherwise it makes it difficult to replace nodes without
potentially
>>>>>>>>> requiring client code/configuration changes. It would
be nice to get
>>>>>>>> #12344
>>>>>>>>> in for 4.0. It's marked as an improvement but I'd consider
it a bug
>>>>> and
>>>>>>>>> thus think it could be included in a later minor release.
>>>>>>>>> 
>>>>>>>>> Introducing all of these in a single release seems pretty
risky. I
>>>>> think
>>>>>>>> it
>>>>>>>>>> would be safer to spread these out over a few 4.x
releases (as
>>>>> they’re
>>>>>>>>>> finished) and give them time to stabilize before
including them in
>>> an
>>>>>>>> LTS
>>>>>>>>>> release. The downside would be having to maintain
backwards
>>>>>>>> compatibility
>>>>>>>>>> across the 4.x versions, but that seems preferable
to delaying the
>>>>>>>> release
>>>>>>>>>> of 4.0 to include these, and having another big bang
release.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I don't think anyone expects 4.0.0 to be stable. It's
a major
>>> version
>>>>>>>>> change with lots of new features; in the production world
people
>>> don't
>>>>>>>>> normally move to a new major version until it has been
out for quite
>>>>> some
>>>>>>>>> time and several minor releases have passed. Really,
most people are
>>>>> only
>>>>>>>>> migrating to 3.0.x now. While stability is important
if we push back
>>>>>>>> large
>>>>>>>>> "core" changes until later we're just setting ourselves
up to face
>>> the
>>>>>>>> same
>>>>>>>>> issues later on. There should be enough uptake on the
early releases
>>>>> of
>>>>>>>> 4.0
>>>>>>>>> from new users to help test and get it to a production-ready
state.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Kurt Greaves
>>>>>>>>> kurt@instaclustr.com <;>
>>>>>>>> 
>>>>>>>> 
>>>>>>> I don't think anyone expects 4.0.0 to be stable
>>>>>>> 
>>>>>>> Someone previously described 3.0 as the "break everything release".
>>>>>>> 
>>>>>>> We know that many people are still 2.1 and 3.0. Cassandra will
always
>>> be
>>>>>>> maintaining 3 or 4 active branches and have adoption issues if
>>> releases
>>>>> are
>>>>>>> not stable and usable.
>>>>>>> 
>>>>>>> Being that cassandra was 1.0 years ago I expect things to be
stable.
>>>>> Half
>>>>>>> working features , or added this broke that are not appealing
to me.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Sorry this was sent from mobile. Will do less grammar and spell
check
>>>>> than
>>>>>>> usual.
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Sorry this was sent from mobile. Will do less grammar and spell check
>>>>> than
>>>>>> usual.
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sorry this was sent from mobile. Will do less grammar and spell check
>>> than
>>>> usual.
>>> 

Mime
View raw message