Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Date: Wed, 21 Apr 2010 10:45:47 -0700
From: Anthony Molinaro <anthonym@alumni.caltech.edu>
To: user@cassandra.apache.org
Subject: Re: Cassandra 0.5.1 restarts slow
Message-ID: <20100421174547.GC39306@alumni.caltech.edu>
Mail-Followup-To: user@cassandra.apache.org
References: <20100420215724.GA35209@alumni.caltech.edu>
 <l2ge06563881004211021v7a94d4adne11c5fd37bd2b35c@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <l2ge06563881004211021v7a94d4adne11c5fd37bd2b35c@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i


On Wed, Apr 21, 2010 at 12:21:31PM -0500, Jonathan Ellis wrote:
> [moving to user@]
> 
> 0.6 fixes replaying faster than it can flush.

Yeah, I noticed some of those fixes, and will probably take the leap into
0.6 if I can keep my cluster running (it's not doing too bad, I do about
400K reads and 250K writes per minute spread over 23 nodes), however some
of the m1.large instances get into this backed up state frequently. 
So I need to keep the cluster running first.

> as for why it backs up in the first place before the restart, you can
> either (a) throttle writes [set your timeout lower, make your clients
> back off temporarily when it gets a timeoutexception]

What timeout is this?  Something in the thrift API or a cassandra
configuration?

> or (b) add capacity.  (b) is recommended.

Yeah I've been doing that adding xlarge instances with raid0 disks which
work better, but I keep running into issues with the old instances which
hold up this work.  I'll keep chugging along and hopefully get things
sorted.

-Anthony

> 
> https://issues.apache.org/jira/browse/CASSANDRA-685 will mitigate this
> but there is still no substitute for adding capacity to match demand.
> 
> On Tue, Apr 20, 2010 at 4:57 PM, Anthony Molinaro
> <anthonym@alumni.caltech.edu> wrote:
> > Hi,
> >
> > �I have a cassandra cluster where a couple things are happening. �Every
> > once in a while a node will start to get backed up. �Checking tpstats I
> > see a very large value for ROW-MUTATION-STAGE. �Sometimes it will be able
> > to clear it if I give it enough time, other times the vm OOMs. �With some
> > nodes I also see this happen during restarts, I'll restart and have to
> > wait 6-12 hours for the node to not be marked as 'Down'.
> > I've seen
> > http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
> > and ended up with the following settings.
> >
> > KeysCachedFraction � � � � � �: 0.01
> > MemtableSizeInMB � � � � � � �: 100
> > MemtableObjectCountInMillions : 0.5
> > Heap � � � � � � � � � � � � �: -Xmx5G
> >
> > I only have 2 CFs in this instance and entries are small so in most cases
> > I hit MemtableObjectCountInMillions first and total MemtableSizeInMB is
> > about 60MB-120MB for the 2 CFs combined.
> >
> > Anyone have any pointers on where to look next? �These are m1.large EC2
> > instances (I want to move to xlarge to get more memory, but haven't yet
> > gotten clarification on the best process for node replacement, per my
> > other thread).
> >
> > Thanks,
> >
> > -Anthony
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro � � � � � � � � � � � � � <anthonym@alumni.caltech.edu>
> >

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@alumni.caltech.edu>