Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE1467F3D for ; Thu, 3 Nov 2011 12:47:04 +0000 (UTC) Received: (qmail 38292 invoked by uid 500); 3 Nov 2011 12:47:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38264 invoked by uid 500); 3 Nov 2011 12:47:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38256 invoked by uid 99); 3 Nov 2011 12:47:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Nov 2011 12:47:02 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.82.109.201] (HELO nm25-vm0.bullet.mail.ird.yahoo.com) (212.82.109.201) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 03 Nov 2011 12:46:56 +0000 Received: from [77.238.189.48] by nm25.bullet.mail.ird.yahoo.com with NNFMP; 03 Nov 2011 12:46:33 -0000 Received: from [212.82.108.124] by tm1.bullet.mail.ird.yahoo.com with NNFMP; 03 Nov 2011 12:46:33 -0000 Received: from [127.0.0.1] by omp1033.mail.ird.yahoo.com with NNFMP; 03 Nov 2011 12:46:33 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 797409.93797.bm@omp1033.mail.ird.yahoo.com Received: (qmail 12965 invoked by uid 60001); 3 Nov 2011 12:46:33 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.uk; s=s1024; t=1320324393; bh=x5nq/0TTsdxOBUqL9COxKeQwViEAdG5afweVKqP5GGc=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=0mQBGndMYwO5cnyO1i1vvPU8/r3+cpKQSGZlxcjtktuBpXQJh49MlMtI81B6YjC2n1JA0+Nf2bYojLPOf7JJhqGJaj+VCJdWhvZ4J7nStD3gzx/032qRCZ9fo4x9qpc/Dbx8b3n6ifmOXAC2BtLa2tgYfvtjFfF6MNz7yaVjJug= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=yI0SUwETRxz8dj0iqBuFF5S58nwcfohL2E+0gDfZurCcGLD5oq7Hy5ClQ3VnQrJBUzAM4ULwpp/7wkOKs/O+NGGo9HBqXqag3h/o/Oqprkzr9W8aKHmIEx3Ga170lqJdw/5NjUHGTFJu7xK2ztKaO9AxISMrybeUF1s1brg2RTQ=; X-YMail-OSG: PwUWLvYVM1kHwxYadq2bRuhYwNOO1zBm5tHoaaI4h5LKHhz .Z93Bsb9Z1LEDB36NQwU7mOgWZ9fVIqQhkEq.hrDF2S73tAOLfRYEM.wyjTK F2wcmqa1E6NOi6kK5m4jOB..yJWSgvM8nGjgeYqjYygoPegOFZ_TC08sWWJu 5goQmUmRo7PnAiULHjRHDrjz3.kDyyWLjwjvvMQqAa1gcf4QF0qqtA3tgP3d 5KQp96JbBujpDSRLzK606zVgzt5rTF.ERWoTQMGY11gBSTG4bUV50lUmlOTn T.tncFLy1uIO5vnOl3O8dCIleozq0hj9J2nsbYMtk6R48Aui0cIhDllWt89p UnJDKkqFZZ.Rf8bjk1DCyEIKycAi.8sI4HRtpj3fjdBxMXSLD3AdzqcR00qF GYQDpNSt5era._v.WvsoXX3Hj1IerTPI8dvHXjCmCzMqaHOSaglddoO.E2ru R9_KuQCkKMvnO53rtXftSClkT2i9pNi_ULXpwdd8tlttCuETz.RpiB4ZQy1f pSE9zi5ymf_Z4eDigO2H1tGwAfhBBJOmH Received: from [194.116.198.179] by web132107.mail.ird.yahoo.com via HTTP; Thu, 03 Nov 2011 12:46:33 GMT X-Mailer: YahooMailWebService/0.8.115.325013 References: Message-ID: <1320324393.2047.YahooMailNeo@web132107.mail.ird.yahoo.com> Date: Thu, 3 Nov 2011 12:46:33 +0000 (GMT) From: Peter Tillotson Reply-To: Peter Tillotson Subject: Re: Second Cassandra users survey To: "user@cassandra.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-1806482184-754292353-1320324393=:2047" ---1806482184-754292353-1320324393=:2047 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable I'm using Cassandra as a big graph database, loading large=A0volumes=A0of d= ata live and linking on the fly.=A0=0AThe number of edges grow geometricall= y with data added, and need to be read to continue linking the graph on the= fly.=A0=0A=0A=0AConsequently, my problem is constrained by:=0A=A0* Predomi= nantly read - especially when data gets large and reads are quasi random=0A= =A0* I have lots of data to plow in, to be read=0A=A0* Although the problem= scale out and possibly all be in RAM, it requires too much kit for the to = be viable=A0=0A=0ASo, my findings with Cassandra are:=0A=A0* Compaction is = expensive, I need it but=0A=A0 =A01) It takes away disk IO from my reads=0A= =A0 =A02)=A0Destroys the file cache=0A=A0 =A0I've not had chance to do exte= nsive tests with the Level db compaction=0A=A0* Compaction has been too har= d to configure historically=0A=A0* Memory hungry=0A=0ASo for me the biggest= features would be=0A=A0* Cheaper compaction - =A0=A0=0A=A0* Lower memory u= sage=0A=A0* Indexing dynamic colnames (eg Lucene TermEnum against rowkey:co= lkey)=0A=A0 =A0I do a lot of=A0checking=A0against dynamic colnames =A0=0A= =A0=0AThe great features are that redundancy, and live addition of shards i= s available out of the box.=A0=0A=0A=0AI've also experimented with Golden O= rb and Triggered updates, I think there is a fair bit that can be achieved = in my problem with local data access. Through GoldenOrb and Hadoop writable= s a managed to get both a BigTable and Pregel access model onto my Cassandr= a data. It was schema specific, but provided a local compute model.=A0=0A= =0Ap=A0=0A=0A=0A________________________________=0AFrom: Jonathan Ellis =0ATo: user =0ASent: Tuesday, 1 = November 2011, 22:59=0ASubject: Second Cassandra users survey=0A=0AHi all,= =0A=0ATwo years ago I asked for Cassandra use cases and feature requests.= =0A[1]=A0 The results [2] have been extremely useful in setting and=0Aprior= itizing goals for Cassandra development.=A0 But with the release of=0A1.0 w= e've accomplished basically everything from our original wish=0Alist. [3]= =0A=0AI'd love to hear from modern Cassandra users again, especially if=0Ay= ou're usually a quiet lurker.=A0 What does Cassandra do well?=A0 What are= =0Ayour pain points?=A0 What's your feature wish list?=0A=0AAs before, if y= ou're in stealth mode or don't want to say anything in=0Apublic, feel free = to reply to me privately and I will keep it off the=0Arecord.=0A=0A[1] http= ://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html=0A= [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg0144= 6.html=0A[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.= html=0A=0A-- =0AJonathan Ellis=0AProject Chair, Apache Cassandra=0Aco-found= er of DataStax, the source for professional Cassandra support=0Ahttp://www.= datastax.com ---1806482184-754292353-1320324393=:2047 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
I'm using Cassandra as a big graph database, loading= large volumes of data live and linking on the fly. <= /span>
The number of edges grow geometrically with data added, and ne= ed to be read to continue linking the graph on the fly. 
Consequently, my problem is constrained by:
 * Predomina= ntly read - especially when data gets large and reads are quasi random
 * Although the prob= lem scale out and possibly all be in RAM, it requires too much kit for the = to be viable 

So, my findings with Cassandra are:
   1) It = takes away disk IO from my reads
   2) Destroys the file cache
<= font class=3D"Apple-style-span" size=3D"2">   I've not had chance= to do extensive tests with the Level db compaction
 * Compaction has been too hard= to configure historically
 * Memory hungry

So for me the biggest features would be
 * Cheaper compaction -  &n= bsp;
 * L= ower memory usage
 * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:co= lkey)
  &= nbsp;I do a lot of checking against dynamic colnames  
 
=
The great= features are that redundancy, and live addition of shards is available out of the box. 
<= /div>

I've also experimented with Golden Orb and Triggered updat= es, I think there is a fair bit that can be achieved in my problem with loc= al data access. Through GoldenOrb and Hadoop writables a managed to get bot= h a BigTable and Pregel access model onto my Cassandra data. It was schema = specific, but provided a local compute model. 

p = ;


From: Jonathan Ellis <j= bellis@gmail.com>
To: user <user@cassandra.apache.org>
Sent: Tuesday, 1 November 2011, 22:59
Subject: Second Cassandra users survey

Hi all,

Two years ago I asked for Cassandra use cases a= nd feature requests.
[1]  The results [2] have been extremely usefu= l in setting and
prioritizing goals for Cassandra development.  But= with the release of
1.0 we've accomplished basically everything from ou= r original wish
list. [3]

I'd love to hear from modern Cassandra = users again, especially if
you're usually a quiet lurker.  What doe= s Cassandra do well?  What are
your pain points?  What's your = feature wish list?

As before, if you're in stealth mode or don't want to sa= y anything in
public, feel free to reply to me privately and I will keep= it off the
record.

[1] http://ww= w.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
= [2] http://www.mail-archive.com/cassandra-= user@incubator.apache.org/msg01446.html
[3] ht= tp://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of = DataStax, the source for professional Cassandra support
http://www.datastax.com


---1806482184-754292353-1320324393=:2047--