cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Gude <>
Subject AW: How is Cassandra being used?
Date Wed, 16 Nov 2011 10:29:00 GMT
I think it is a very good idea to gather such information and to make it easy for the users
who want to or don't care and to consider the "twitchiness" as well.
How about putting the reporting code in a separate module/jar and report statistics if the
jar is there and don’t if it is not (similar as it is done with using native stuff if JNA
is there)
The one could provide to binary archives on the homepage. One with the jar and one without.
That way people who do not want the code simply select the shipment where it is not included
or delete the jar.
You could - (and should) - even stick a "Enable/Disable" switch on top of it. 

-----Ursprüngliche Nachricht-----
Von: Dave Brosius [] 
Gesendet: Mittwoch, 16. November 2011 02:25
Betreff: Re: How is Cassandra being used?

+1 for an opt-in approach. To get better opt-in rates perhaps prompt for it on start (once)
rather than hope folks find it buried in the yaml

Eric Evans <> wrote:

>On Tue, Nov 15, 2011 at 11:23 PM, Jonathan Ellis <> wrote:
>> I started a "users survey" thread over on the users list (replies are
>> still trickling in), but as useful as that is, I'd like to get
>> feedback that is more quantitative and with a broader base.  This will
>> let us prioritize our development efforts to better address what
>> people are actually using it for, with less guesswork.  For instance:
>> we put a lot of effort into compression for 1.0.0; if it turned out
>> that only 1% of 1.0.x users actually enable compression, then it means
>> that we should spend less effort fine-tuning that moving forward, and
>> use the energy elsewhere.
>> (Of course it could also mean that we did a terrible job getting the
>> word out about new features and explaining how to use them, but either
>> way, it would be good to know!)
>> I propose adding a basic cluster reporting feature to cassandra.yaml,
>> enabled by default.  It would send anonymous information about your
>> cluster to an VM.  Information like, number (but not names)
>> of keyspaces and columnfamilies, ks-level options like compression, cf
>> options like compaction strategy, data types (again, not names) of
>> columns, average row size (or better: the histogram data), and average
>> sstables per read.
>> Thoughts?
>I think this is potentially quite dangerous; There are a lot people
>who get very twitchy at the idea of software that Phones Home.  I've
>seen this so many times, and in all cases it was for software a lot
>less sensitive than a database.
>I'm sure you've already considered this though, you're already talking
>about anonymity, and transparency, and what I assume is neutrality of
>the collection endpoint (can apache actually provide a VM; is that a
>thing?).  I'm just afraid that we'll scare people off before they can
>be properly convinced that it's all on the up-and-up.
>I'm curious to see what others think, but at the moment I'm hovering
>somewhere around a -0 if it were opt-in (off by default).
>Eric Evans
>Acunu | | @acunu
View raw message