cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: Cassandra Java Driver and DataStax
Date Mon, 06 Jun 2016 22:14:01 GMT

> On Jun 5, 2016, at 4:33 PM, Mattmann, Chris A (3980) <chris.a.mattmann@jpl.nasa.gov>
wrote:
> 
> Thanks for the info Jonathan. I think have assessed based on
> the replies thus far, my studying of the archives and
> commit and project history the following situation.
> 
> Unfortunately it seems like there is a bit of control going on
> I’m going to call a spade a spade here. A key portion of your 
> software’s stack, a client driver to use it, exists outside of
> Apache in separate communities. This is an inherent risk to the
> project. Some of you cite flexibility and adaptability as reasons
> for this - I’ve seen it in so many communities over the last 12+
> years in the foundation - it’s not really due to those issues.

Not all open-source projects do well under the apache umbrella in my opinion.  Additionally
not all library dependencies for all apache projects come from apache.

> There is definitely some control going on.

One thing I like about the ASF is that it’s about contribution and meritocracy.  If you
have a company that is devoted to making a project successful, you’ll have more contribution
from them.  Some people will gravitate to work for that company because they are passionate
about the project and working there allows them to spend more time on it than they would have
been able to at other companies.  And yet there are several committers and PMC members that
don’t work for DataStax who have an influence over its development.  I think you may mean
control in terms of contribution as you talk about in your next questions.  If that’s the
case, how do you get other people to contribute more?  DataStax has already sponsored several
contributor bootcamps for instance.  If it’s about contribution and meritocracy, is there
an instance where a contribution was not accepted because of where an individual was employed?
 Is there an instance where someone wasn’t accepted as a contributor or committer or pmc
member because of where they worked?  Several of those committers/PMC members who are currently
at DataStax became committers/PMC members before joining DataStax.  I’m just trying to understand
the nature of where you see a problem.

> I would ask you all
> this - has there been a PR or patch in the past year or two that
> wasn’t singularly reviewed by DataStax committers and PMC? Also,
> as to the composition of the PMC when was the last time a non 
> DataStax person was elected to the PMC and/or as a committer?
> 
> By itself the diversity issues alone are not damning to the 
> project, but taken together with the citation to other project
> communities even those outside of Apache (e.g., the comments
> well “Postgres does it this way, so it’s a good example to
> compare us to” or “these other 4 projects at the ASF do it 
> like this, so X”.. [sic]) and with the perception being created
> to those that don’t work at DataStax, and there is an issue here.

I don’t quite understand the thinking here - referencing how successful open-source projects
operate (outside the asf) is damning?  Mentioning that Cassandra is not unique within Apache
to have client drivers outside of the main project is damning?  I don’t understand why that
either of those would be in any way negative or invalid.

Regarding the thread generally, I still haven’t seen 1) an instance where having client
drivers developed outside of the core project being a problem or 2) an example where employees
of datastax exerted control over the project to the detriment of others or 3) an example of
anyone in the community saying “yeah, you’re right, they do control stuff and it sucks.”

Personally, I would like to see more specific concrete evidence of a problem.  I’ve been
involved with the project since 0.6 and it’s always had a very open and active community
complete with ideas, disagreements, and mistakes.  Take for example CASSANDRA-9666 where the
best time series compaction strategy alternatives are discussed.  It was determined that a
community contribution from a third party company was going to supersede/replace what was
in the tree.  If there are specific concerns, I think everyone involved in the project would
like to know.

> 
> I would like to see a discussion in your next board report about
> the diversity and health issues of the project, and also some 
> ideas about potential strategies for mitigation. 
> 
> I appreciate the open and honest conversation thus far. Let’s
> keep it up.
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 6/5/16, 1:51 PM, "Jonathan Ellis" <jbellis@gmail.com> wrote:
> 
>> On Sun, Jun 5, 2016 at 8:32 AM, Mattmann, Chris A (3980) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>> 
>>> 1. Is Apache Cassandra useful *without* a driver? That is, can
>>> you use the database without a driver to connect to it or in the
>>> real world would your users all have to download at least one
>>> driver in order to use the DB?
>>> 
>> 
>> The users do need to download a driver--but this is pretty normal for
>> community-driven OSS databases.  Besides the Apache projects I listed,
>> PostgreSQL also runs on a community-maintained driver model.
>> 
>> 
>>> 2. To confirm again, at one point at least the Java driver code
>>> lived in the code-base, and further, at one point, people did
>>> submit some patches to add drivers, but the PMC didn’t want to
>>> maintain that code (and apparently they didn’t want to create
>>> any new PMC members and/or committers to do so) and so thus
>>> people started their own new projects? That right?
>>> 
>> 
>> I think that summary over-emphasizes the governance aspect at the expense
>> of more important considerations:
>> 
>> 0. The very first Cassandra driver interface was Thrift.  No Thrift clients
>> were ever part of the Cassandra tree.
>> 
>> 1. When we created the CQL protocol, we initially had a Java driver in tree
>> as a reference implementation.
>> 
>> 2. But due primarily to the project management issues mentioned by Nate,
>> and secondarily to the governance aspects above, we moved quickly back to
>> the pure community-driven drivers approach that had worked for us before.
>> 
>> 2a. While some Apache databases do ship a Java driver in tree, I think that
>> this hinders adoption because it signals to users that non-Java drivers are
>> second-class citizens.  (No doubt this is not the *intent* of that
>> decision, but it is a likely consequence nevertheless.)
>> 
>> 2b. DataStax saw CQL adoption as a key driver for Cassandra adoption and
>> hence its own success, and hired a team to accelerate the production of
>> drivers for the new CQL protocol.  These drivers are Apache licensed and
>> see broad community participation, e.g. with ~70 contributors to the Java
>> driver.
>> 
>> 2c. Neither has DataStax "sucked the oxygen out of the room."  Lots of
>> non-DataStax drivers exist as well.
>> 
>> As Aleksey pointed out earlier, I don't see anyone being harmed by this
>> state of affairs.  Cassandra PMC doesn't want to run drivers projects,
>> driver authors don't want to be run by Cassandra PMC, and meanwhile users
>> have Apache licensed drivers that let them be productive with Cassandra.


Mime
View raw message