Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 72744 invoked from network); 21 May 2010 14:38:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 May 2010 14:38:03 -0000 Received: (qmail 13680 invoked by uid 500); 21 May 2010 14:38:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 13658 invoked by uid 500); 21 May 2010 14:38:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13650 invoked by uid 99); 21 May 2010 14:38:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 May 2010 14:38:02 +0000 X-ASF-Spam-Status: No, hits=1.1 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of isoboroff@gmail.com designates 209.85.221.200 as permitted sender) Received: from [209.85.221.200] (HELO mail-qy0-f200.google.com) (209.85.221.200) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 May 2010 14:37:57 +0000 Received: by qyk38 with SMTP id 38so1541875qyk.17 for ; Fri, 21 May 2010 07:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=bbua4KDPMV0ujKnqqULAgl3C6+K3EiuDj9juQ61theQ=; b=UoZQvscsrrUT7nZOnfhj5Q6Os3aEz5d1B67yU++pYURfYagojMHUDzghNydIA7NAOn RifTU1Ke+gVyxXgkxAV1njzcGDG3NChKmcSe481EX1Ziubovpsf+KDEOS1V+aFbLwSsD 9zYmfwpapf53k00GsXS5Rt/alSs1Ql/0klvzE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=iZE3kPUijmXr29n4osVjhif6f02XBugaO1X7XXceM7tVOW0K0/dt3RDQCcMjGqTGG+ sqIviXCpAgH1NdgUIgW5XtMb5aeRxL7jxf/9x3yLY+jJ2WFc6Qpxv861uBLzUl8tj3pl WxwhOiq3u3GbHVYi5KpU6xAoE5CJHbbaHrCI8= MIME-Version: 1.0 Received: by 10.224.125.132 with SMTP id y4mr1198073qar.191.1274452655454; Fri, 21 May 2010 07:37:35 -0700 (PDT) Received: by 10.229.214.144 with HTTP; Fri, 21 May 2010 07:37:35 -0700 (PDT) In-Reply-To: References: Date: Fri, 21 May 2010 10:37:35 -0400 Message-ID: Subject: Re: Scaling problems From: Ian Soboroff To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636457250cfbbbd04871ba152 --001636457250cfbbbd04871ba152 Content-Type: text/plain; charset=ISO-8859-1 On the to-do list for today. Is there a tool to aggregate all the JMX stats from all nodes? I mean, something a little more complete than nagios. Ian On Fri, May 21, 2010 at 10:23 AM, Jonathan Ellis wrote: > you should check the jmx stages I posted about > > On Fri, May 21, 2010 at 7:05 AM, Ian Soboroff wrote: > > Just an update. I rolled the memtable size back to 128MB. I am still > > seeing that the daemon runs for a while with reasonable heap usage, but > then > > the heap climbs up to the max (6GB in this case, should be plenty) and it > > starts GCing, without much getting cleared. The client catches lots of > > exceptions, where I wait 30 seconds and try again, with a new client if > > necessary, but it doesn't clear up. > > > > Could this be related to memory leak problems I've skimmed past on the > list > > here? > > > > It can't be that I'm creating rows a bit at a time... once I stick a web > > page into two CFs, it's over and done with for this application. I'm > just > > trying to get stuff loaded. > > > > Is there a limit to how much on-disk data a Cassandra daemon can manage? > Is > > there runtime overhead associated with stuff on disk? > > > > Ian > > > > On Thu, May 20, 2010 at 9:31 PM, Ian Soboroff > wrote: > >> > >> Excellent leads, thanks. cassandra.in.sh has a heap of 6GB, but I > didn't > >> realize that I was trying to float so many memtables. I'll poke > tomorrow > >> and report if it gets fixed. > >> Ian > >> > >> On Thu, May 20, 2010 at 10:40 AM, Jonathan Ellis > >> wrote: > >>> > >>> Some possibilities: > >>> > >>> You didn't adjust Cassandra heap size in cassandra.in.sh (1GB is too > >>> small) > >>> You're inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats will show > >>> large pending ops -- large = 100s) > >>> You're creating large rows a bit at a time and Cassandra OOMs when it > >>> tries to compact (the oom should usually be in the compaction thread) > >>> You have your 5 disks each with a separate data directory, which will > >>> allow up to 12 total memtables in-flight internally, and 12*256 is too > >>> much for the heap size you have (FLUSH-WRITER-STAGE in tpstats will > >>> show large pending ops -- large = more than 2 or 3) > >>> > >>> On Tue, May 18, 2010 at 6:24 AM, Ian Soboroff > >>> wrote: > >>> > I hope this isn't too much of a newbie question. I am using > Cassandra > >>> > 0.6.1 > >>> > on a small cluster of Linux boxes - 14 nodes, each with 8GB RAM and 5 > >>> > data > >>> > drives. The nodes are running HDFS to serve files within the > cluster, > >>> > but > >>> > at the moment the rest of Hadoop is shut down. I'm trying to load a > >>> > large > >>> > set of web pages (the ClueWeb collection, but more is coming) and my > >>> > Cassandra daemons keep dying. > >>> > > >>> > I'm loading the pages into a simple column family that lets me fetch > >>> > out > >>> > pages by an internal ID or by URL. The biggest thing in the row is > the > >>> > page > >>> > content, maybe 15-20k per page of raw HTML. There aren't a lot of > >>> > columns. > >>> > I tried Thrift, Hector, and the BMT interface, and at the moment I'm > >>> > doing > >>> > batch mutations over Thrift, about 2500 pages per batch, because that > >>> > was > >>> > fastest for me in testing. > >>> > > >>> > At this point, each Cassandra node has between 500GB and 1.5TB > >>> > according to > >>> > nodetool ring. Let's say I start the daemons up, and they all go > live > >>> > after > >>> > a couple minutes of scanning the tables. I then start my importer, > >>> > which is > >>> > a single Java process reading Clueweb bundles over HDFS, cutting them > >>> > up, > >>> > and sending the mutations to Cassandra. I only talk to one node at a > >>> > time, > >>> > switching to a new node when I get an exception. As the job runs > over > >>> > a few > >>> > hours, the Cassandra daemons eventually fall over, either with no > error > >>> > in > >>> > the log or reporting that they are out of heap. > >>> > > >>> > Each daemon is getting 6GB of RAM and has scads of disk space to play > >>> > with. > >>> > I've set the storage-conf.xml to take 256MB in a memtable before > >>> > flushing > >>> > (like the BMT case), and to do batch commit log flushes, and to not > >>> > have any > >>> > caching in the CFs. I'm sure I must be tuning something wrong. I > >>> > would > >>> > eventually like this Cassandra setup to serve a light request load > but > >>> > over > >>> > say 50-100 TB of data. I'd appreciate any help or advice you can > >>> > offer. > >>> > > >>> > Thanks, > >>> > Ian > >>> > > >>> > >>> > >>> > >>> -- > >>> Jonathan Ellis > >>> Project Chair, Apache Cassandra > >>> co-founder of Riptano, the source for professional Cassandra support > >>> http://riptano.com > >> > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > --001636457250cfbbbd04871ba152 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On the to-do list for today.=A0 Is there a tool to aggregate all=A0 the JMX= stats from all nodes?=A0 I mean, something a little more complete than nag= ios.
Ian

On Fri, May 21, 2010 at 10:23= AM, Jonathan Ellis <jbellis@gmail.com> wrote:
you should check = the jmx stages I posted about

On Fri, May 21, 2010 at 7:05 AM, Ian Soboroff <isoboroff@gmail.com> wrote:
> Just an update.=A0 I rolled the memtable size back to 128MB.=A0 I am s= till
> seeing that the daemon runs for a while with reasonable heap usage, bu= t then
> the heap climbs up to the max (6GB in this case, should be plenty) and= it
> starts GCing, without much getting cleared.=A0 The client catches lots= of
> exceptions, where I wait 30 seconds and try again, with a new client i= f
> necessary, but it doesn't clear up.
>
> Could this be related to memory leak problems I've skimmed past on= the list
> here?
>
> It can't be that I'm creating rows a bit at a time... once I s= tick a web
> page into two CFs, it's over and done with for this application.= =A0 I'm just
> trying to get stuff loaded.
>
> Is there a limit to how much on-disk data a Cassandra daemon can manag= e?=A0 Is
> there runtime overhead associated with stuff on disk?
>
> Ian
>
> On Thu, May 20, 2010 at 9:31 PM, Ian Soboroff <isoboroff@gmail.com> wrote:
>>
>> Excellent leads, thanks.=A0 cassandra.in.sh has a heap of 6GB, but I didn't
>> realize that I was trying to float so many memtables.=A0 I'll = poke tomorrow
>> and report if it gets fixed.
>> Ian
>>
>> On Thu, May 20, 2010 at 10:40 AM, Jonathan Ellis <jbellis@gmail.com>
>> wrote:
>>>
>>> Some possibilities:
>>>
>>> You didn't adjust Cassandra heap size in cassandra.in.sh (1GB is too
>>> small)
>>> You're inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats= will show
>>> large pending ops -- large =3D 100s)
>>> You're creating large rows a bit at a time and Cassandra O= OMs when it
>>> tries to compact (the oom should usually be in the compaction = thread)
>>> You have your 5 disks each with a separate data directory, whi= ch will
>>> allow up to 12 total memtables in-flight internally, and 12*25= 6 is too
>>> much for the heap size you have (FLUSH-WRITER-STAGE in tpstats= will
>>> show large pending ops -- large =3D more than 2 or 3)
>>>
>>> On Tue, May 18, 2010 at 6:24 AM, Ian Soboroff <isoboroff@gmail.com>
>>> wrote:
>>> > I hope this isn't too much of a newbie question.=A0 I= am using Cassandra
>>> > 0.6.1
>>> > on a small cluster of Linux boxes - 14 nodes, each with 8= GB RAM and 5
>>> > data
>>> > drives.=A0 The nodes are running HDFS to serve files with= in the cluster,
>>> > but
>>> > at the moment the rest of Hadoop is shut down.=A0 I'm= trying to load a
>>> > large
>>> > set of web pages (the ClueWeb collection, but more is com= ing) and my
>>> > Cassandra daemons keep dying.
>>> >
>>> > I'm loading the pages into a simple column family tha= t lets me fetch
>>> > out
>>> > pages by an internal ID or by URL.=A0 The biggest thing i= n the row is the
>>> > page
>>> > content, maybe 15-20k per page of raw HTML.=A0 There aren= 't a lot of
>>> > columns.
>>> > I tried Thrift, Hector, and the BMT interface, and at the= moment I'm
>>> > doing
>>> > batch mutations over Thrift, about 2500 pages per batch, = because that
>>> > was
>>> > fastest for me in testing.
>>> >
>>> > At this point, each Cassandra node has between 500GB and = 1.5TB
>>> > according to
>>> > nodetool ring.=A0 Let's say I start the daemons up, a= nd they all go live
>>> > after
>>> > a couple minutes of scanning the tables.=A0 I then start = my importer,
>>> > which is
>>> > a single Java process reading Clueweb bundles over HDFS, = cutting them
>>> > up,
>>> > and sending the mutations to Cassandra.=A0 I only talk to= one node at a
>>> > time,
>>> > switching to a new node when I get an exception.=A0 As th= e job runs over
>>> > a few
>>> > hours, the Cassandra daemons eventually fall over, either= with no error
>>> > in
>>> > the log or reporting that they are out of heap.
>>> >
>>> > Each daemon is getting 6GB of RAM and has scads of disk s= pace to play
>>> > with.
>>> > I've set the storage-conf.xml to take 256MB in a memt= able before
>>> > flushing
>>> > (like the BMT case), and to do batch commit log flushes, = and to not
>>> > have any
>>> > caching in the CFs.=A0 I'm sure I must be tuning some= thing wrong.=A0 I
>>> > would
>>> > eventually like this Cassandra setup to serve a light req= uest load but
>>> > over
>>> > say 50-100 TB of data.=A0 I'd appreciate any help or = advice you can
>>> > offer.
>>> >
>>> > Thanks,
>>> > Ian
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra s= upport
>>> http://riptan= o.com
>>
>
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

--001636457250cfbbbd04871ba152--