cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian O'Neill" <>
Subject Re: How is Cassandra being used?
Date Wed, 16 Nov 2011 14:11:20 GMT
Lively thread...

+1 opt-in
+1 in separate module

I'll just substantiate Rick Shaw's comments.  If this is on by default, I
can see it making its way into production at a large corporation, at which
time the traffic would sound an alarm as suspicious activity, which would
immediately get the server's plug pulled and trigger an investigation.
 That would land the architect responsible for deploying that server in the
proverbial principal's office.  In the extreme case, that might
"black-list" the technology and add fuel to any debate that the corporation
should just stick with the 'proven enterprise' solutions.  That is not my
perspective, just be aware that in some large corporations it is an uphill
battle to deploy Cassandra  in the first place given incumbent systems.

In every situation I've been in, even outside of large corporations, we
would need to disable this feature given the sensitivity of the data.

All that said... I would love to see this data. ;)
I'd love to know where our deployment lies on the spectrum of use.

Maybe a good old fashioned web form that allows companies to submit their
usage scenarios might accomplish the same goal? (and you could get
additional context information about the industry, etc.)  It wouldn't be
comprehensive, but it may be sufficiently representative.  Maybe you could
just output a couple lines at server start that said something like "Go
here http://... to see how your usage compares to others."

I personally wouldn't throw to big a hissy if it was incorporated into the
actual server and on by default, but I certainly know others that would.


On Wed, Nov 16, 2011 at 7:17 AM, Eric Evans <> wrote:

> On Wed, Nov 16, 2011 at 2:01 AM, Jonathan Ellis <> wrote:
> > On Tue, Nov 15, 2011 at 7:02 PM, Eric Evans <> wrote:
> >> I think this is potentially quite dangerous; There are a lot people
> >> who get very twitchy at the idea of software that Phones Home.  I've
> >> seen this so many times, and in all cases it was for software a lot
> >> less sensitive than a database.
> >
> > True, but unlike most Home Phoners, ours will be out there in the open
> > and you can see exactly what it's sending (or not, if you disable it).
> >  I'm sure there's other examples in the wild of this, but the only one
> > I can think of is popcorn [1].
> I don't think the transparency of the implementation changes things
> much.  It's still going to be opaque to a lot of folks, and more
> importantly is the precedence it sets and the way it changes the
> project/user trust relationship.
> Even if you're satisfied with the implementation, and trust that it
> won't be extended to transmit additional data later (unintentionally
> or otherwise), there are still very valid privacy concerns.  For
> example, seeing as how this must be transmitted over an IP network,
> there are only so many guarantees you can make with respect to
> anonymity.  There will always be *someone* that can tie the data to a
> unique IP, and an IP can almost always be tied to an individual or
> organization.  Imagine an organization that doesn't want *anyone* to
> know it uses Cassandra, and isn't willing to accept the risk that one
> of their admins might accidentally enable this reporting.
> It's also interesting that you mention popcon because it has always
> been contentious.  It's taken years for it to transition from the
> point where it required users to install it themselves, to a prompt at
> install-time that defaulted to "No", to the current state of an
> install-time prompt that defaults to "Yes".  And, the installer asks
> *very* few questions; Whether or not popcon is enabled is on par with
> partitioning and the assignment of a root password.
> Also, there should be no shame in the admission that we haven't earned
> anywhere near the level of trust and respect that the Debian project
> has.
> > More broadly, my sense is that people are getting used to the idea
> > that it's okay to give away anonymous statistics as part of the price
> > of "free," although YMMclearlyV. I am, after all, a Windows user. :)
> As privacy becomes more threatened people are either capitulating, or
> becoming even more defensive; Whether that makes it better or worse
> for us if we do this is debatable.
> >> I'm sure you've already considered this though, you're already talking
> >> about anonymity, and transparency, and what I assume is neutrality of
> >> the collection endpoint (can apache actually provide a VM; is that a
> >> thing?).
> >
> > Yes, they provide Ubuntu or FreeBSD VMs.
> >
> >> I'm just afraid that we'll scare people off before they can
> >> be properly convinced that it's all on the up-and-up.
> >
> > How would you propose addressing this?
> Honestly?  The best way to convince people that we take the privacy of
> their data seriously is to not transmit any of it to a machine outside
> their control.
> >> I'm curious to see what others think, but at the moment I'm hovering
> >> somewhere around a -0 if it were opt-in (off by default).
> >
> > I'm okay with opt-in if you think that's useful as a first step to
> > ease the twitchiness you mention, but longer term I think it's only
> > really useful if it's on by default. There's a lot of research that
> > shows that people tend to stick with whatever is the path of least
> > resistance [2], and specifically, my experience with Cassandra users
> > is exactly that -- one reason we've spent so much effort getting
> > defaults so good is because almost nobody goes beyond that.
> It's even worse than that.  It's not just that you'll be receiving
> less data, it will also be less meaningful (since it's from a
> self-selecting group).
> > [1]
> > [2]
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra support
> >
> >
> --
> Eric Evans
> Acunu | | @acunu

Brian ONeill
Lead Architect, Health Market Science (

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message