cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Evans <>
Subject Re: How is Cassandra being used?
Date Wed, 16 Nov 2011 12:17:07 GMT
On Wed, Nov 16, 2011 at 2:01 AM, Jonathan Ellis <> wrote:
> On Tue, Nov 15, 2011 at 7:02 PM, Eric Evans <> wrote:
>> I think this is potentially quite dangerous; There are a lot people
>> who get very twitchy at the idea of software that Phones Home.  I've
>> seen this so many times, and in all cases it was for software a lot
>> less sensitive than a database.
> True, but unlike most Home Phoners, ours will be out there in the open
> and you can see exactly what it's sending (or not, if you disable it).
>  I'm sure there's other examples in the wild of this, but the only one
> I can think of is popcorn [1].

I don't think the transparency of the implementation changes things
much.  It's still going to be opaque to a lot of folks, and more
importantly is the precedence it sets and the way it changes the
project/user trust relationship.

Even if you're satisfied with the implementation, and trust that it
won't be extended to transmit additional data later (unintentionally
or otherwise), there are still very valid privacy concerns.  For
example, seeing as how this must be transmitted over an IP network,
there are only so many guarantees you can make with respect to
anonymity.  There will always be *someone* that can tie the data to a
unique IP, and an IP can almost always be tied to an individual or
organization.  Imagine an organization that doesn't want *anyone* to
know it uses Cassandra, and isn't willing to accept the risk that one
of their admins might accidentally enable this reporting.

It's also interesting that you mention popcon because it has always
been contentious.  It's taken years for it to transition from the
point where it required users to install it themselves, to a prompt at
install-time that defaulted to "No", to the current state of an
install-time prompt that defaults to "Yes".  And, the installer asks
*very* few questions; Whether or not popcon is enabled is on par with
partitioning and the assignment of a root password.

Also, there should be no shame in the admission that we haven't earned
anywhere near the level of trust and respect that the Debian project

> More broadly, my sense is that people are getting used to the idea
> that it's okay to give away anonymous statistics as part of the price
> of "free," although YMMclearlyV. I am, after all, a Windows user. :)

As privacy becomes more threatened people are either capitulating, or
becoming even more defensive; Whether that makes it better or worse
for us if we do this is debatable.

>> I'm sure you've already considered this though, you're already talking
>> about anonymity, and transparency, and what I assume is neutrality of
>> the collection endpoint (can apache actually provide a VM; is that a
>> thing?).
> Yes, they provide Ubuntu or FreeBSD VMs.
>> I'm just afraid that we'll scare people off before they can
>> be properly convinced that it's all on the up-and-up.
> How would you propose addressing this?

Honestly?  The best way to convince people that we take the privacy of
their data seriously is to not transmit any of it to a machine outside
their control.

>> I'm curious to see what others think, but at the moment I'm hovering
>> somewhere around a -0 if it were opt-in (off by default).
> I'm okay with opt-in if you think that's useful as a first step to
> ease the twitchiness you mention, but longer term I think it's only
> really useful if it's on by default. There's a lot of research that
> shows that people tend to stick with whatever is the path of least
> resistance [2], and specifically, my experience with Cassandra users
> is exactly that -- one reason we've spent so much effort getting
> defaults so good is because almost nobody goes beyond that.

It's even worse than that.  It's not just that you'll be receiving
less data, it will also be less meaningful (since it's from a
self-selecting group).

> [1]
> [2]
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support

Eric Evans
Acunu | | @acunu

View raw message