cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Thomas <ma...@apache.org>
Subject Re: Partition size
Date Mon, 12 Sep 2016 11:10:50 GMT
On 09/09/2016 21:11, Benedict Elliott Smith wrote:
> Come on. This kind of inconsistent 'policing' is not helpful.

How is it inconsistent? Since I subscribed to the mailing list on 22
August, this is the first instance I have seen of anyone providing a
link to third party docs rather than the equivalent project hosted docs
in response to a user question. If I missed any, please point them out.
The lists are pretty busy and that, combined with my minimal technical
knowledge of Cassandra, means it is perfectly possible I missed some.

I've done a quick double check of the user@ archives and while I do see
a number of messages referencing 3rd party docs, those references were
made by the OP rather than someone from the community providing an answer.

> By all means, push the /*committers*/ to improve the project docs as is
> happening, and to promote the internal resources over external ones.
> 
> But Mark has absolutely no formal connection with the project, and his
> contributions have only been to file a couple of JIRA (all of which have
> so far been ignored by those of his colleagues who /are/ active
> community members, I'll note!).  Shaming him for not linking docs that
> describe something /other/ than what he was even talking about is
> crossing the line IMO.  

Any member of a project community (contributor, committer or PMC member)
directing users to 3rd party docs in preference to project docs without
a good reason is missing an opportunity to strengthen that project
community.

> Linking to third-party resources is commonplace, the only difference I
> can see here is that these have been called "docs"  by the authors,
> instead of a blog post, and Mark has a DataStax email address.

Linking to third party reference docs for an Apache project in response
to a configuration question about that Apache project on one of the
project's mailing lists is pretty unusual.

Linking to third party docs, blogs, etc is fairly common but they tend
to be linked by the OP in the form of "I've followed the instructions I
found here and it doesn't work". The responses to such questions
typically include links to the relevant parts of the Apache hosted docs.

If the question is more involved then I have seen links to blogs,
presentations, YouTube etc provided as an answer. If this happens
multiple times for the same topic then it is usually added to an FAQ,
wiki or similar along with an e-mail to the author to see if they'd be
willing to contribute something to the docs.

> Would you have reacted this way if Aaron Morton linked a blog post by
> thelastpickle?  Or a random user posted their own resources?  Obviously not.

Wrong. My reaction was based on the content of the message (a link to
3rd party docs in response to a question when an equivalent link to
project hosted docs was available) not on who sent it or their employer.

> I was initially all for the ASF endeavour to counteract DataStax'
> outsized influence on the project, and was hopeful you might achieve
> some positive change.  Perhaps you may well still do.  But it seems to
> me that the ASF behaviour is beginning to cross from constructive
> criticism of the project participants to prejudicially hostile behaviour
> against certain community members - and that is unlikely to result in a
> better project.
> 
> You should be treating everyone consistently, in a manner that promotes
> project health.

It is not healthy if community members are directing users to 3rd party
documentation in preference to the project's own documentation. If it is
happening because the project's documentation is non-existent / wrong /
poorly written / etc. then that is understandable (and would be an issue
the project needed to address) but that was not the case in this instance.

There are many aspects to community health. In the grand scheme of
things the single e-mail that started this particular discussion is in
the noise. However, a consistent pattern of such e-mails would be much
more troubling. My intent was to ensure that such a pattern did not form.

Whether people agree with my response or not, the community is hopefully
more aware of the issue than it was previously.

Mark


> On Friday, 9 September 2016, Mark Thomas <markt@apache.org
> <mailto:markt@apache.org>> wrote:
> 
>     On 09/09/2016 16:46, Mark Curtis wrote:
>     > If your partition sizes are over 100MB iirc then you'll normally see
>     > warnings in your system.log, this will outline the partition key, at
>     > least in Cassandra 2.0 and 2.1 as I recall.
>     >
>     > Your best friend here is nodetool cfstats which shows you the
>     > min/mean/max partition sizes for your table. It's quite often used to
>     > pinpoint large partitons on nodes in a cluster.
>     >
>     > More info
>     > here:
>     https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html
>     <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCFstats.html>
> 
>     Folks,
> 
>     It is *Apache* Cassandra. If you are going to point to docs, please
>     point to the official Apache docs unless there is a very good reason
>     not to.
> 
>     In this case:
> 
>     http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb
>     <http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#compaction_large_partition_warning_threshold_mb>
> 
>     looks to the place.
> 
>     Mark
> 
> 
>     >
>     > Thanks
>     >
>     > Mark
>     >
>     >
>     > On 9 September 2016 at 02:53, Anshu Vajpayee <anshu.vajpayee@gmail.com
>     > <mailto:anshu.vajpayee@gmail.com>> wrote:
>     >
>     >     Is there any way to get partition size for a  partition key ?
>     >
>     >
> 


Mime
View raw message