Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates
 74.125.82.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=n1t8T1otE1yIKE5WnHPx1KI6dFNnXgZ1xx1elJpnlQOHJkIsTCuelILrFDOio8Gvk0
         tMl8GaRXKdLjO62M0QmSslweu2nwx+HaI1RxHlSe10NQfhicbOLXleV3F3RCIqsqa4ve
         vxBQOdMDuDeubQPGb1lhjemEAe0a+SaAcMG1Q=
MIME-Version: 1.0
In-Reply-To: <4DDE6368.5090508@list-group.com>
References: <4DDE6368.5090508@list-group.com>
From: Jonathan Ellis <jbellis@gmail.com>
Date: Thu, 26 May 2011 11:04:59 -0500
Message-ID: <BANLkTi=PEcwkN4j2eK-+nG=r9g3pNfrJgA@mail.gmail.com>
Subject: Re: OOM recovering failed node with many CFs
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Sounds like a legitimate bug, although looking through the code I'm
not sure what would cause a tight retry loop on migration
announce/rectify. Can you create a ticket at
https://issues.apache.org/jira/browse/CASSANDRA ?

As a workaround, I would try manually copying the Migrations and
Schema sstable files from the system keyspace of the live node, then
restart the recovering one.

On Thu, May 26, 2011 at 9:27 AM, Flavio Baronti
<f.baronti@list-group.com> wrote:
> I can't seem to be able to recover a failed node on a database where i did
> many updates to the schema.
>
> I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but
> it can't be changed right now), and ReplicationFactor=2.
> I shut down a node and cleaned its data entirely, then tried to bring it
> back up. The node starts fetching schema updates from the live node, but the
> operation fails halfway with an OOME.
> After some investigation, what I found is that:
>
> - I have a lot of schema updates (there are 2067 rows in the system.Schema
> CF).
> - The live node loads migrations 1-1000, and sends them to the recovering
> node (Migration.getLocalMigrations())
> - Soon afterwards, the live node checks the schema version on the recovering
> node and finds it has moved by a little - say it has applied the first 3
> migrations. It then loads migrations 3-1003, and sends them to the node.
> - This process is repeated very quickly (sends migrations 6-1006, 9-1009,
> etc).
>
> Analyzing the memory dump and the logs, it looks like each of these 1000
> migration blocks are composed in a single message and sent to the
> OutboundTcpConnection queue. However, since the schema is big, the messages
> occupy a lot of space, and are built faster than the connection can send
> them. Therefore, they accumulate in OutboundTcpConnection.queue, until
> memory is completely filled.
>
> Any suggestions? Can I change something to make this work, apart from
> reducing the number of CFs?
>
> Flavio
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com