Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.co.uk;
  h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
  b=rkug5yc7DEJ9Oik0sg0W9x2OJsXjUjvzUFaA8YyziyJPn//QoQtDPNaUfzcPJbNeQ/8PBPOmZfJTE/pp3WC5UQkM8mMYFbRedY4ukGCkXqqNTwOWElYT5Xm9wNqGPMGUQkhKIPo/gStqm3OUhbCLFFiJBx7lVk8+Nex9wos7Lug=;
References: 
 <CALdd-zjvWsd4r7fS8xL0D2JDzP9Vok2+w7u_4wCyPkfGMxKbPg@mail.gmail.com>
 <1320324393.2047.YahooMailNeo@web132107.mail.ird.yahoo.com>
 <CAOT3TWrva=uR8xU66uWzYHATaw7c94PeOwbhrx1_A85tTu4PJw@mail.gmail.com>
Message-ID: <1320335435.22047.YahooMailNeo@web132110.mail.ird.yahoo.com>
Date: Thu, 3 Nov 2011 15:50:35 +0000 (GMT)
From: Peter Tillotson <slatemine@yahoo.co.uk>
Reply-To: Peter Tillotson <slatemine@yahoo.co.uk>
Subject: Re: Second Cassandra users survey
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
In-Reply-To: 
 <CAOT3TWrva=uR8xU66uWzYHATaw7c94PeOwbhrx1_A85tTu4PJw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="1263293997-941126134-1320335435=:22047"

--1263293997-941126134-1320335435=:22047
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

>> =A0* Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey=
)=0A>> =A0 =A0I do a lot of=A0checking=A0against dynamic colnames=0A>=0A>I =
agree, some kind of integration with search engine is required to=0A>suppor=
t adhoc queries as well and searching on column names. This will=0A>be real=
ly helpful.=0A>=0A>Currently, one of the options is to write in 2 places. C=
assandra +=0A>search engine.=0A>=0A=0AI thought a disk backed skiplist, wit=
h every nth rowkey:colkey dragged into memory per sstable as per Lucene Ter=
mEnum. =A0=0A=0A=0A________________________________=0AFrom: Mohit Anchlia <=
mohitanchlia@gmail.com>=0ATo: user@cassandra.apache.org; Peter Tillotson <s=
latemine@yahoo.co.uk>=0ASent: Thursday, 3 November 2011, 14:15=0ASubject: R=
e: Second Cassandra users survey=0A=0AOn Thu, Nov 3, 2011 at 5:46 AM, Peter=
 Tillotson <slatemine@yahoo.co.uk> wrote:=0A> I'm using Cassandra as a big =
graph database, loading large=A0volumes=A0of data=0A> live and linking on t=
he fly.=0A=0ANot sure if Cassandra is right fit to model complex vertexes a=
nd edges.=0A=0A> The number of edges grow geometrically with data added, an=
d need to be read=0A> to continue linking the graph on the fly.=0A>=0A> Con=
sequently, my problem is constrained by:=0A> =A0* Predominantly read - espe=
cially when data gets large and reads are quasi=0A> random=0A> =A0* I have =
lots of data to plow in, to be read=0A> =A0* Although the problem scale out=
 and possibly all be in RAM, it requires=0A> too much kit for the to be via=
ble=0A> So, my findings with Cassandra are:=0A> =A0* Compaction is expensiv=
e, I need it but=0A> =A0 =A01) It takes away disk IO from my reads=0A> =A0 =
=A02)=A0Destroys the file cache=0A> =A0 =A0I've not had chance to do extens=
ive tests with the Level db compaction=0A> =A0* Compaction has been too har=
d to configure historically=0A> =A0* Memory hungry=0A> So for me the bigges=
t features would be=0A> =A0* Cheaper compaction -=0A> =A0* Lower memory usa=
ge=0A> =A0* Indexing dynamic colnames (eg Lucene TermEnum against rowkey:co=
lkey)=0A> =A0 =A0I do a lot of=A0checking=A0against dynamic colnames=0A=0AI=
 agree, some kind of integration with search engine is required to=0Asuppor=
t adhoc queries as well and searching on column names. This will=0Abe reall=
y helpful.=0A=0ACurrently, one of the options is to write in 2 places. Cass=
andra +=0Asearch engine.=0A>=0A> The great features are that redundancy, an=
d live addition of shards is=0A> available out of the box.=0A>=0A> I've als=
o experimented with Golden Orb and Triggered updates, I think there=0A> is =
a fair bit that can be achieved in my problem with local data access.=0A> T=
hrough GoldenOrb and Hadoop writables a managed to get both a BigTable and=
=0A> Pregel access model onto my Cassandra data. It was schema specific, bu=
t=0A> provided a local compute model.=0A> p=0A> ___________________________=
_____=0A> From: Jonathan Ellis <jbellis@gmail.com>=0A> To: user <user@cassa=
ndra.apache.org>=0A> Sent: Tuesday, 1 November 2011, 22:59=0A> Subject: Sec=
ond Cassandra users survey=0A>=0A> Hi all,=0A>=0A> Two years ago I asked fo=
r Cassandra use cases and feature requests.=0A> [1]=A0 The results [2] have=
 been extremely useful in setting and=0A> prioritizing goals for Cassandra =
development.=A0 But with the release of=0A> 1.0 we've accomplished basicall=
y everything from our original wish=0A> list. [3]=0A>=0A> I'd love to hear =
from modern Cassandra users again, especially if=0A> you're usually a quiet=
 lurker.=A0 What does Cassandra do well?=A0 What are=0A> your pain points?=
=A0 What's your feature wish list?=0A>=0A> As before, if you're in stealth =
mode or don't want to say anything in=0A> public, feel free to reply to me =
privately and I will keep it off the=0A> record.=0A>=0A> [1]=0A> http://www=
.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html=0A> [2]=
=0A> http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg014=
46.html=0A> [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg015=
24.html=0A>=0A> --=0A> Jonathan Ellis=0A> Project Chair, Apache Cassandra=
=0A> co-founder of DataStax, the source for professional Cassandra support=
=0A> http://www.datastax.com=0A>=0A>=0A>
--1263293997-941126134-1320335435=:22047
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"color:#000; background-color:#fff; font-family:ar=
ial, helvetica, sans-serif;font-size:10pt"><div><span><span class=3D"Apple-=
style-span" style=3D"font-family: 'times new roman', 'new york', times, ser=
if; font-size: 16px; ">&gt;&gt; &nbsp;* Indexing dynamic colnames (eg Lucen=
e TermEnum against rowkey:colkey)</span><br style=3D"font-family: 'times ne=
w roman', 'new york', times, serif; font-size: 16px; "><span class=3D"Apple=
-style-span" style=3D"font-family: 'times new roman', 'new york', times, se=
rif; font-size: 16px; ">&gt;&gt; &nbsp; &nbsp;I do a lot of&nbsp;checking&n=
bsp;against dynamic colnames</span><br style=3D"font-family: 'times new rom=
an', 'new york', times, serif; font-size: 16px; ">&gt;<br style=3D"font-fam=
ily: 'times new roman', 'new york', times, serif; font-size: 16px; "><span =
class=3D"Apple-style-span" style=3D"font-family: 'times new roman', 'new yo=
rk', times, serif; font-size: 16px; ">&gt;I agree, some kind of integration=
 with search
 engine is required to</span><br style=3D"font-family: 'times new roman', '=
new york', times, serif; font-size: 16px; "><span class=3D"Apple-style-span=
" style=3D"font-family: 'times new roman', 'new york', times, serif; font-s=
ize: 16px; ">&gt;support adhoc queries as well and searching on column name=
s. This will</span><br style=3D"font-family: 'times new roman', 'new york',=
 times, serif; font-size: 16px; "><span class=3D"Apple-style-span" style=3D=
"font-family: 'times new roman', 'new york', times, serif; font-size: 16px;=
 ">&gt;be really helpful.</span><br style=3D"font-family: 'times new roman'=
, 'new york', times, serif; font-size: 16px; ">&gt;<br style=3D"font-family=
: 'times new roman', 'new york', times, serif; font-size: 16px; "><span cla=
ss=3D"Apple-style-span" style=3D"font-family: 'times new roman', 'new york'=
, times, serif; font-size: 16px; ">&gt;Currently, one of the options is to =
write in 2 places. Cassandra +</span><br style=3D"font-family: 'times new r=
oman', 'new
 york', times, serif; font-size: 16px; "><span class=3D"Apple-style-span" s=
tyle=3D"font-family: 'times new roman', 'new york', times, serif; font-size=
: 16px; ">&gt;search engine.</span><br style=3D"font-family: 'times new rom=
an', 'new york', times, serif; font-size: 16px; "><span class=3D"Apple-styl=
e-span" style=3D"font-family: 'times new roman', 'new york', times, serif; =
font-size: 16px; ">&gt;</span><br style=3D"font-family: 'times new roman', =
'new york', times, serif; font-size: 16px; "></span></div><div><span><span =
class=3D"Apple-style-span" style=3D"font-family: 'times new roman', 'new yo=
rk', times, serif; font-size: 16px; ">I thought a disk backed skiplist, wit=
h every nth rowkey:colkey dragged into memory per sstable as per Lucene Ter=
mEnum. &nbsp;</span></span></div><div><br></div><div style=3D"font-size: 10=
pt; font-family: arial, helvetica, sans-serif; "><div style=3D"font-size: 1=
2pt; font-family: 'times new roman', 'new york', times, serif; "><font size=
=3D"2"
 face=3D"Arial"><hr size=3D"1"><b><span style=3D"font-weight:bold;">From:</=
span></b> Mohit Anchlia &lt;mohitanchlia@gmail.com&gt;<br><b><span style=3D=
"font-weight: bold;">To:</span></b> user@cassandra.apache.org; Peter Tillot=
son &lt;slatemine@yahoo.co.uk&gt;<br><b><span style=3D"font-weight: bold;">=
Sent:</span></b> Thursday, 3 November 2011, 14:15<br><b><span style=3D"font=
-weight: bold;">Subject:</span></b> Re: Second Cassandra users survey<br></=
font><br>On Thu, Nov 3, 2011 at 5:46 AM, Peter Tillotson &lt;<a ymailto=3D"=
mailto:slatemine@yahoo.co.uk" href=3D"mailto:slatemine@yahoo.co.uk">slatemi=
ne@yahoo.co.uk</a>&gt; wrote:<br>&gt; I'm using Cassandra as a big graph da=
tabase, loading large&nbsp;volumes&nbsp;of data<br>&gt; live and linking on=
 the fly.<br><br>Not sure if Cassandra is right fit to model complex vertex=
es and edges.<br><br>&gt; The number of edges grow geometrically with data =
added, and need to be read<br>&gt; to continue linking the graph on the
 fly.<br>&gt;<br>&gt; Consequently, my problem is constrained by:<br>&gt; &=
nbsp;* Predominantly read - especially when data gets large and reads are q=
uasi<br>&gt; random<br>&gt; &nbsp;* I have lots of data to plow in, to be r=
ead<br>&gt; &nbsp;* Although the problem scale out and possibly all be in R=
AM, it requires<br>&gt; too much kit for the to be viable<br>&gt; So, my fi=
ndings with Cassandra are:<br>&gt; &nbsp;* Compaction is expensive, I need =
it but<br>&gt; &nbsp; &nbsp;1) It takes away disk IO from my reads<br>&gt; =
&nbsp; &nbsp;2)&nbsp;Destroys the file cache<br>&gt; &nbsp; &nbsp;I've not =
had chance to do extensive tests with the Level db compaction<br>&gt; &nbsp=
;* Compaction has been too hard to configure historically<br>&gt; &nbsp;* M=
emory hungry<br>&gt; So for me the biggest features would be<br>&gt; &nbsp;=
* Cheaper compaction -<br>&gt; &nbsp;* Lower memory usage<br>&gt; &nbsp;* I=
ndexing dynamic colnames (eg Lucene TermEnum against
 rowkey:colkey)<br>&gt; &nbsp; &nbsp;I do a lot of&nbsp;checking&nbsp;again=
st dynamic colnames<br><br>I agree, some kind of integration with search en=
gine is required to<br>support adhoc queries as well and searching on colum=
n names. This will<br>be really helpful.<br><br>Currently, one of the optio=
ns is to write in 2 places. Cassandra +<br>search engine.<br>&gt;<br>&gt; T=
he great features are that redundancy, and live addition of shards is<br>&g=
t; available out of the box.<br>&gt;<br>&gt; I've also experimented with Go=
lden Orb and Triggered updates, I think there<br>&gt; is a fair bit that ca=
n be achieved in my problem with local data access.<br>&gt; Through GoldenO=
rb and Hadoop writables a managed to get both a BigTable and<br>&gt; Pregel=
 access model onto my Cassandra data. It was schema specific, but<br>&gt; p=
rovided a local compute model.<br>&gt; p<br>&gt; __________________________=
______<br>&gt; From: Jonathan Ellis &lt;<a
 ymailto=3D"mailto:jbellis@gmail.com" href=3D"mailto:jbellis@gmail.com">jbe=
llis@gmail.com</a>&gt;<br>&gt; To: user &lt;<a ymailto=3D"mailto:user@cassa=
ndra.apache.org" href=3D"mailto:user@cassandra.apache.org">user@cassandra.a=
pache.org</a>&gt;<br>&gt; Sent: Tuesday, 1 November 2011, 22:59<br>&gt; Sub=
ject: Second Cassandra users survey<br>&gt;<br>&gt; Hi all,<br>&gt;<br>&gt;=
 Two years ago I asked for Cassandra use cases and feature requests.<br>&gt=
; [1]&nbsp; The results [2] have been extremely useful in setting and<br>&g=
t; prioritizing goals for Cassandra development.&nbsp; But with the release=
 of<br>&gt; 1.0 we've accomplished basically everything from our original w=
ish<br>&gt; list. [3]<br>&gt;<br>&gt; I'd love to hear from modern Cassandr=
a users again, especially if<br>&gt; you're usually a quiet lurker.&nbsp; W=
hat does Cassandra do well?&nbsp; What are<br>&gt; your pain points?&nbsp; =
What's your feature wish list?<br>&gt;<br>&gt; As before, if you're in
 stealth mode or don't want to say anything in<br>&gt; public, feel free to=
 reply to me privately and I will keep it off the<br>&gt; record.<br>&gt;<b=
r>&gt; [1]<br>&gt; <a href=3D"http://www.mail-archive.com/cassandra-dev@inc=
ubator.apache.org/msg01148.html" target=3D"_blank">http://www.mail-archive.=
com/cassandra-dev@incubator.apache.org/msg01148.html</a><br>&gt; [2]<br>&gt=
; <a href=3D"http://www.mail-archive.com/cassandra-user@incubator.apache.or=
g/msg01446.html" target=3D"_blank">http://www.mail-archive.com/cassandra-us=
er@incubator.apache.org/msg01446.html</a><br>&gt; [3] <a href=3D"http://www=
.mail-archive.com/dev@cassandra.apache.org/msg01524.html" target=3D"_blank"=
>http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html</a><br>=
&gt;<br>&gt; --<br>&gt; Jonathan Ellis<br>&gt; Project Chair, Apache Cassan=
dra<br>&gt; co-founder of DataStax, the source for professional Cassandra s=
upport<br>&gt; <a href=3D"http://www.datastax.com"
 target=3D"_blank">http://www.datastax.com</a><br>&gt;<br>&gt;<br>&gt;<br><=
br><br></div></div></div></body></html>
--1263293997-941126134-1320335435=:22047--