cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Linares <>
Subject Re: Ingesting from Hadoop to Cassandra
Date Thu, 28 May 2009 16:03:45 GMT
Jonathan, sorry for the lengthy emails! Hope this one's more readable.

So I'm fairly convinced it's not a Cassandra-side configuration problem; at least not one
that entails tweaking the object count threshold or the memtable size.

Given the client code at :

(my hadoop client implementation is pretty much identical)

I ran this ingestion client until it started crawling.  Without stopping the previous, I started
a separate instance to see if the crawling behavior would be mimicked there; which I'm assuming
would happen if my Cassandra instance was caught up in GC.  Fortunately, this one ran fine
and again crawled when it got through ~15k row inserts.  Again, I started a new ingestion
instance which also ran fine until it got through ~15k row inserts.

I ran this against a 3 node Cassandra cluster.  The jconsole outputs (the 3 cassandra nodes)
for this entire scenario are attached to this email as a png (note: I stopped all ingestion
@ ~19:55-19:58)

In my storage-conf.xml:

My cassandra table setup looks like the following:
        <Table Name="ClusterF">
            <ColumnFamily ColumnType="Super" ColumnSort="Time" Name="Composite"/>

The Cassandra JVMs are all running with -Xmx1500m+ and each dedicated server has 2G+ of  RAM.


From: Jonathan Ellis <>
Sent: Wednesday, May 27, 2009 5:43:55 PM
Subject: Re: Ingesting from Hadoop to Cassandra

On Wed, May 27, 2009 at 6:39 PM, Alexandre Linares <> wrote:
> So it actually doesn't look blocked, but it's crawling.  Of course, in
> Hadoop, it always timed out (10 mins), before I could tell that it was
> crawling (I think)

So, back to the original hypothesis: you need to increase the memory
you are giving to the JVM, (in bin/ or increase the
flush frequency (by lowering the memtable object count threshold).

> Can you reproduce with a non-hadoop client program that you can share here?

BTW, I meant share the client code, not a client thread dump.  And
please use attachments for thread dumps or source files; it's really
impossible to read this thread on my phone with everything jammed into
the body. :)


View raw message