spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@apache.org>
Subject Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends
Date Sun, 17 Apr 2016 06:12:13 GMT
First, really thank you for leading the discussion.

I am concerned that it'd hurt Spark more than it helps. As many others have
pointed out, this unnecessarily creates a new tier of connectors or 3rd
party libraries appearing to be endorsed by the Spark PMC or the ASF. We
can alleviate this concern by not having "Spark" in the name, and the
project proposal and documentation should label clearly that this is not
affiliated with Spark.

Also Luciano - assuming you are interested in creating a project like this
and find a home for the connectors that were removed, I find it surprising
that few of the initially proposed PMC members have actually contributed
much to the connectors, and people that have contributed a lot were left
out. I am sure that is just an oversight.



On Sat, Apr 16, 2016 at 10:42 PM, Luciano Resende <luckbr1975@gmail.com>
wrote:

>
>
> On Sat, Apr 16, 2016 at 5:38 PM, Evan Chan <velvia.github@gmail.com>
> wrote:
>
>> Hi folks,
>>
>> Sorry to join the discussion late.  I had a look at the design doc
>> earlier in this thread, and it was not mentioned what types of
>> projects are the targets of this new "spark extras" ASF umbrella....
>>
>> Is the desire to have a maintained set of spark-related projects that
>> keep pace with the main Spark development schedule?  Is it just for
>> streaming connectors?  what about data sources, and other important
>> projects in the Spark ecosystem?
>>
>
> The proposal draft below has some more details on what type of projects,
> but in summary, "Spark-Extras" would be a good place for any of these
> components you mentioned.
>
>
> https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing
>
>
>>
>> I'm worried that this would relegate spark-packages to third tier
>> status,
>
>
> Owen answered a similar question about spark-packages earlier on this
> thread, but while "Spark-Extras" would a place in Apache for collaboration
> on the development of these extensions, they might still be published to
> spark-packages as they existing streaming connectors are today.
>
>
>> and the promotion of a select set of committers, and the
>> project itself, to top level ASF status (a la Arrow) would create a
>> further split in the community.
>>
>>
> As for the select set of committers, we have invited all Spark committers
> to be committers on the project, and I have updated the project proposal
> with the existing set of active Spark committers ( that have committed in
> the last one year)
>
>
>>
>> -Evan
>>
>> On Sat, Apr 16, 2016 at 4:46 AM, Steve Loughran <stevel@hortonworks.com>
>> wrote:
>> >
>> >
>> >
>> >
>> >
>> > On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>> >
>> >>Yeah in support of this statement I think that my primary interest in
>> >>this Spark Extras and the good work by Luciano here is that anytime we
>> >>take bits out of a code base and “move it to GitHub” I see a bad
>> precedent
>> >>being set.
>> >>
>> >>Creating this project at the ASF creates a synergy between *Apache
>> Spark*
>> >>which is *at the ASF*.
>> >>
>> >>We welcome comments and as Luciano said, this is meant to invite and be
>> >>open to those in the Apache Spark PMC to join and help.
>> >>
>> >>Cheers,
>> >>Chris
>> >
>> > As one of the people named, here's my rationale:
>> >
>> > Throwing stuff into github creates that world of branches, and its no
>> longer something that could be managed through the ASF, where managed is:
>> governance, participation and a release process that includes auditing
>> dependencies, code-signoff, etc,
>> >
>> >
>> > As an example, there's a mutant hive JAR which spark uses, that's
>> something which currently evolved between my repo and Patrick Wendell's;
>> now that Josh Rosen has taken on the bold task of "trying to move spark and
>> twill to Kryo 3", he's going to own that code, and now the reference branch
>> will move somewhere else.
>> >
>> > In contrast, if there was an ASF location for this, then it'd be
>> something anyone with commit rights could maintain and publish
>> >
>> > (actually, I've just realised life is hard here as the hive is a fork
>> of ASF hive —really the spark branch should be a separate branch in Hive's
>> own repo ... But the concept is the same: those bits of the codebase which
>> are core parts of the spark project should really live in or near it)
>> >
>> >
>> > If everyone on the spark commit list gets write access to this extras
>> repo, moving things is straightforward. Release wise, things could/should
>> be in sync.
>> >
>> > If there's a risk, its the eternal problem of the contrib/ dir ....
>> Stuff ends up there that never gets maintained. I don't see that being any
>> worse than if things were thrown to the wind of a thousand github repos: at
>> least now there'd be a central issue tracking location.
>>
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>

Mime
View raw message