Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of isoboroff@gmail.com
 designates 209.85.221.200 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=iZE3kPUijmXr29n4osVjhif6f02XBugaO1X7XXceM7tVOW0K0/dt3RDQCcMjGqTGG+
         sqIviXCpAgH1NdgUIgW5XtMb5aeRxL7jxf/9x3yLY+jJ2WFc6Qpxv861uBLzUl8tj3pl
         WxwhOiq3u3GbHVYi5KpU6xAoE5CJHbbaHrCI8=
MIME-Version: 1.0
In-Reply-To: <AANLkTikla6iFoVA-qdjBORR89AJO7EbuXkxigGO5dfXO@mail.gmail.com>
References: <AANLkTil4jdDw6zgdBXZAFI1ZA-6VKbtSPvdGwsEQKK7Y@mail.gmail.com>
	 <AANLkTim0hOsV001CFM_OY8fvDoEzXHX8Rv0mqhVJWZDA@mail.gmail.com>
	 <AANLkTimQJ5xpm1R7vk5Fq2xfUjJdL-K2WVkeZ6Uf9b_g@mail.gmail.com>
	 <AANLkTin12loM4y7O5SW3jyOxCnYAeBepnx5siCyzr39u@mail.gmail.com>
	 <AANLkTikla6iFoVA-qdjBORR89AJO7EbuXkxigGO5dfXO@mail.gmail.com>
Date: Fri, 21 May 2010 10:37:35 -0400
Message-ID: <AANLkTin9H3cjs2CYsovNNmSOJw3tu_V_TVOoDY-Rh32B@mail.gmail.com>
Subject: Re: Scaling problems
From: Ian Soboroff <isoboroff@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001636457250cfbbbd04871ba152

--001636457250cfbbbd04871ba152
Content-Type: text/plain; charset=ISO-8859-1

On the to-do list for today.  Is there a tool to aggregate all  the JMX
stats from all nodes?  I mean, something a little more complete than nagios.
Ian

On Fri, May 21, 2010 at 10:23 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> you should check the jmx stages I posted about
>
> On Fri, May 21, 2010 at 7:05 AM, Ian Soboroff <isoboroff@gmail.com> wrote:
> > Just an update.  I rolled the memtable size back to 128MB.  I am still
> > seeing that the daemon runs for a while with reasonable heap usage, but
> then
> > the heap climbs up to the max (6GB in this case, should be plenty) and it
> > starts GCing, without much getting cleared.  The client catches lots of
> > exceptions, where I wait 30 seconds and try again, with a new client if
> > necessary, but it doesn't clear up.
> >
> > Could this be related to memory leak problems I've skimmed past on the
> list
> > here?
> >
> > It can't be that I'm creating rows a bit at a time... once I stick a web
> > page into two CFs, it's over and done with for this application.  I'm
> just
> > trying to get stuff loaded.
> >
> > Is there a limit to how much on-disk data a Cassandra daemon can manage?
> Is
> > there runtime overhead associated with stuff on disk?
> >
> > Ian
> >
> > On Thu, May 20, 2010 at 9:31 PM, Ian Soboroff <isoboroff@gmail.com>
> wrote:
> >>
> >> Excellent leads, thanks.  cassandra.in.sh has a heap of 6GB, but I
> didn't
> >> realize that I was trying to float so many memtables.  I'll poke
> tomorrow
> >> and report if it gets fixed.
> >> Ian
> >>
> >> On Thu, May 20, 2010 at 10:40 AM, Jonathan Ellis <jbellis@gmail.com>
> >> wrote:
> >>>
> >>> Some possibilities:
> >>>
> >>> You didn't adjust Cassandra heap size in cassandra.in.sh (1GB is too
> >>> small)
> >>> You're inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats will show
> >>> large pending ops -- large = 100s)
> >>> You're creating large rows a bit at a time and Cassandra OOMs when it
> >>> tries to compact (the oom should usually be in the compaction thread)
> >>> You have your 5 disks each with a separate data directory, which will
> >>> allow up to 12 total memtables in-flight internally, and 12*256 is too
> >>> much for the heap size you have (FLUSH-WRITER-STAGE in tpstats will
> >>> show large pending ops -- large = more than 2 or 3)
> >>>
> >>> On Tue, May 18, 2010 at 6:24 AM, Ian Soboroff <isoboroff@gmail.com>
> >>> wrote:
> >>> > I hope this isn't too much of a newbie question.  I am using
> Cassandra
> >>> > 0.6.1
> >>> > on a small cluster of Linux boxes - 14 nodes, each with 8GB RAM and 5
> >>> > data
> >>> > drives.  The nodes are running HDFS to serve files within the
> cluster,
> >>> > but
> >>> > at the moment the rest of Hadoop is shut down.  I'm trying to load a
> >>> > large
> >>> > set of web pages (the ClueWeb collection, but more is coming) and my
> >>> > Cassandra daemons keep dying.
> >>> >
> >>> > I'm loading the pages into a simple column family that lets me fetch
> >>> > out
> >>> > pages by an internal ID or by URL.  The biggest thing in the row is
> the
> >>> > page
> >>> > content, maybe 15-20k per page of raw HTML.  There aren't a lot of
> >>> > columns.
> >>> > I tried Thrift, Hector, and the BMT interface, and at the moment I'm
> >>> > doing
> >>> > batch mutations over Thrift, about 2500 pages per batch, because that
> >>> > was
> >>> > fastest for me in testing.
> >>> >
> >>> > At this point, each Cassandra node has between 500GB and 1.5TB
> >>> > according to
> >>> > nodetool ring.  Let's say I start the daemons up, and they all go
> live
> >>> > after
> >>> > a couple minutes of scanning the tables.  I then start my importer,
> >>> > which is
> >>> > a single Java process reading Clueweb bundles over HDFS, cutting them
> >>> > up,
> >>> > and sending the mutations to Cassandra.  I only talk to one node at a
> >>> > time,
> >>> > switching to a new node when I get an exception.  As the job runs
> over
> >>> > a few
> >>> > hours, the Cassandra daemons eventually fall over, either with no
> error
> >>> > in
> >>> > the log or reporting that they are out of heap.
> >>> >
> >>> > Each daemon is getting 6GB of RAM and has scads of disk space to play
> >>> > with.
> >>> > I've set the storage-conf.xml to take 256MB in a memtable before
> >>> > flushing
> >>> > (like the BMT case), and to do batch commit log flushes, and to not
> >>> > have any
> >>> > caching in the CFs.  I'm sure I must be tuning something wrong.  I
> >>> > would
> >>> > eventually like this Cassandra setup to serve a light request load
> but
> >>> > over
> >>> > say 50-100 TB of data.  I'd appreciate any help or advice you can
> >>> > offer.
> >>> >
> >>> > Thanks,
> >>> > Ian
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Jonathan Ellis
> >>> Project Chair, Apache Cassandra
> >>> co-founder of Riptano, the source for professional Cassandra support
> >>> http://riptano.com
> >>
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

--001636457250cfbbbd04871ba152
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On the to-do list for today.=A0 Is there a tool to aggregate all=A0 the JMX=
 stats from all nodes?=A0 I mean, something a little more complete than nag=
ios.<br>Ian<br><br><div class=3D"gmail_quote">On Fri, May 21, 2010 at 10:23=
 AM, Jonathan Ellis <span dir=3D"ltr">&lt;<a href=3D"mailto:jbellis@gmail.c=
om">jbellis@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">you should check =
the jmx stages I posted about<br>
<div><div></div><div class=3D"h5"><br>
On Fri, May 21, 2010 at 7:05 AM, Ian Soboroff &lt;<a href=3D"mailto:isoboro=
ff@gmail.com">isoboroff@gmail.com</a>&gt; wrote:<br>
&gt; Just an update.=A0 I rolled the memtable size back to 128MB.=A0 I am s=
till<br>
&gt; seeing that the daemon runs for a while with reasonable heap usage, bu=
t then<br>
&gt; the heap climbs up to the max (6GB in this case, should be plenty) and=
 it<br>
&gt; starts GCing, without much getting cleared.=A0 The client catches lots=
 of<br>
&gt; exceptions, where I wait 30 seconds and try again, with a new client i=
f<br>
&gt; necessary, but it doesn&#39;t clear up.<br>
&gt;<br>
&gt; Could this be related to memory leak problems I&#39;ve skimmed past on=
 the list<br>
&gt; here?<br>
&gt;<br>
&gt; It can&#39;t be that I&#39;m creating rows a bit at a time... once I s=
tick a web<br>
&gt; page into two CFs, it&#39;s over and done with for this application.=
=A0 I&#39;m just<br>
&gt; trying to get stuff loaded.<br>
&gt;<br>
&gt; Is there a limit to how much on-disk data a Cassandra daemon can manag=
e?=A0 Is<br>
&gt; there runtime overhead associated with stuff on disk?<br>
&gt;<br>
&gt; Ian<br>
&gt;<br>
&gt; On Thu, May 20, 2010 at 9:31 PM, Ian Soboroff &lt;<a href=3D"mailto:is=
oboroff@gmail.com">isoboroff@gmail.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Excellent leads, thanks.=A0 <a href=3D"http://cassandra.in.sh" tar=
get=3D"_blank">cassandra.in.sh</a> has a heap of 6GB, but I didn&#39;t<br>
&gt;&gt; realize that I was trying to float so many memtables.=A0 I&#39;ll =
poke tomorrow<br>
&gt;&gt; and report if it gets fixed.<br>
&gt;&gt; Ian<br>
&gt;&gt;<br>
&gt;&gt; On Thu, May 20, 2010 at 10:40 AM, Jonathan Ellis &lt;<a href=3D"ma=
ilto:jbellis@gmail.com">jbellis@gmail.com</a>&gt;<br>
&gt;&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Some possibilities:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; You didn&#39;t adjust Cassandra heap size in <a href=3D"http:/=
/cassandra.in.sh" target=3D"_blank">cassandra.in.sh</a> (1GB is too<br>
&gt;&gt;&gt; small)<br>
&gt;&gt;&gt; You&#39;re inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats=
 will show<br>
&gt;&gt;&gt; large pending ops -- large =3D 100s)<br>
&gt;&gt;&gt; You&#39;re creating large rows a bit at a time and Cassandra O=
OMs when it<br>
&gt;&gt;&gt; tries to compact (the oom should usually be in the compaction =
thread)<br>
&gt;&gt;&gt; You have your 5 disks each with a separate data directory, whi=
ch will<br>
&gt;&gt;&gt; allow up to 12 total memtables in-flight internally, and 12*25=
6 is too<br>
&gt;&gt;&gt; much for the heap size you have (FLUSH-WRITER-STAGE in tpstats=
 will<br>
&gt;&gt;&gt; show large pending ops -- large =3D more than 2 or 3)<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; On Tue, May 18, 2010 at 6:24 AM, Ian Soboroff &lt;<a href=3D"m=
ailto:isoboroff@gmail.com">isoboroff@gmail.com</a>&gt;<br>
&gt;&gt;&gt; wrote:<br>
&gt;&gt;&gt; &gt; I hope this isn&#39;t too much of a newbie question.=A0 I=
 am using Cassandra<br>
&gt;&gt;&gt; &gt; 0.6.1<br>
&gt;&gt;&gt; &gt; on a small cluster of Linux boxes - 14 nodes, each with 8=
GB RAM and 5<br>
&gt;&gt;&gt; &gt; data<br>
&gt;&gt;&gt; &gt; drives.=A0 The nodes are running HDFS to serve files with=
in the cluster,<br>
&gt;&gt;&gt; &gt; but<br>
&gt;&gt;&gt; &gt; at the moment the rest of Hadoop is shut down.=A0 I&#39;m=
 trying to load a<br>
&gt;&gt;&gt; &gt; large<br>
&gt;&gt;&gt; &gt; set of web pages (the ClueWeb collection, but more is com=
ing) and my<br>
&gt;&gt;&gt; &gt; Cassandra daemons keep dying.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; I&#39;m loading the pages into a simple column family tha=
t lets me fetch<br>
&gt;&gt;&gt; &gt; out<br>
&gt;&gt;&gt; &gt; pages by an internal ID or by URL.=A0 The biggest thing i=
n the row is the<br>
&gt;&gt;&gt; &gt; page<br>
&gt;&gt;&gt; &gt; content, maybe 15-20k per page of raw HTML.=A0 There aren=
&#39;t a lot of<br>
&gt;&gt;&gt; &gt; columns.<br>
&gt;&gt;&gt; &gt; I tried Thrift, Hector, and the BMT interface, and at the=
 moment I&#39;m<br>
&gt;&gt;&gt; &gt; doing<br>
&gt;&gt;&gt; &gt; batch mutations over Thrift, about 2500 pages per batch, =
because that<br>
&gt;&gt;&gt; &gt; was<br>
&gt;&gt;&gt; &gt; fastest for me in testing.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; At this point, each Cassandra node has between 500GB and =
1.5TB<br>
&gt;&gt;&gt; &gt; according to<br>
&gt;&gt;&gt; &gt; nodetool ring.=A0 Let&#39;s say I start the daemons up, a=
nd they all go live<br>
&gt;&gt;&gt; &gt; after<br>
&gt;&gt;&gt; &gt; a couple minutes of scanning the tables.=A0 I then start =
my importer,<br>
&gt;&gt;&gt; &gt; which is<br>
&gt;&gt;&gt; &gt; a single Java process reading Clueweb bundles over HDFS, =
cutting them<br>
&gt;&gt;&gt; &gt; up,<br>
&gt;&gt;&gt; &gt; and sending the mutations to Cassandra.=A0 I only talk to=
 one node at a<br>
&gt;&gt;&gt; &gt; time,<br>
&gt;&gt;&gt; &gt; switching to a new node when I get an exception.=A0 As th=
e job runs over<br>
&gt;&gt;&gt; &gt; a few<br>
&gt;&gt;&gt; &gt; hours, the Cassandra daemons eventually fall over, either=
 with no error<br>
&gt;&gt;&gt; &gt; in<br>
&gt;&gt;&gt; &gt; the log or reporting that they are out of heap.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; Each daemon is getting 6GB of RAM and has scads of disk s=
pace to play<br>
&gt;&gt;&gt; &gt; with.<br>
&gt;&gt;&gt; &gt; I&#39;ve set the storage-conf.xml to take 256MB in a memt=
able before<br>
&gt;&gt;&gt; &gt; flushing<br>
&gt;&gt;&gt; &gt; (like the BMT case), and to do batch commit log flushes, =
and to not<br>
&gt;&gt;&gt; &gt; have any<br>
&gt;&gt;&gt; &gt; caching in the CFs.=A0 I&#39;m sure I must be tuning some=
thing wrong.=A0 I<br>
&gt;&gt;&gt; &gt; would<br>
&gt;&gt;&gt; &gt; eventually like this Cassandra setup to serve a light req=
uest load but<br>
&gt;&gt;&gt; &gt; over<br>
&gt;&gt;&gt; &gt; say 50-100 TB of data.=A0 I&#39;d appreciate any help or =
advice you can<br>
&gt;&gt;&gt; &gt; offer.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; Thanks,<br>
&gt;&gt;&gt; &gt; Ian<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; --<br>
&gt;&gt;&gt; Jonathan Ellis<br>
&gt;&gt;&gt; Project Chair, Apache Cassandra<br>
&gt;&gt;&gt; co-founder of Riptano, the source for professional Cassandra s=
upport<br>
&gt;&gt;&gt; <a href=3D"http://riptano.com" target=3D"_blank">http://riptan=
o.com</a><br>
&gt;&gt;<br>
&gt;<br>
&gt;<br>
<br>
<br>
<br>
</div></div>--<br>
<div><div></div><div class=3D"h5">Jonathan Ellis<br>
Project Chair, Apache Cassandra<br>
co-founder of Riptano, the source for professional Cassandra support<br>
<a href=3D"http://riptano.com" target=3D"_blank">http://riptano.com</a><br>
</div></div></blockquote></div><br>

--001636457250cfbbbd04871ba152--