incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Algermissen <jan.algermis...@nordsc.com>
Subject Re: Cassandra crashes
Date Tue, 10 Sep 2013 00:16:49 GMT
Hi John,


On 10.09.2013, at 01:06, John Sanda <john.sanda@gmail.com> wrote:

> Check your file limits - 
> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html

Did that already - without success.

Meanwhile I upgraded servers and I am getting closer.

I assume by now that heavy writes of rows with considerable size (as in: more than a couple
of numbers) require a certain amount of RAM due to the C* architecture.

IOW, my through put limit is how fast I can get it to disk, but the minimal memory I need
for that cannot be tuned down but depends on the size of the stuff written to C*. (Due to
C* doing its memtable magic) to save using sequential IO.

It is an interesting trade off. (if I get it right by now :-)

Jan

> 
> On Friday, September 6, 2013, Jan Algermissen wrote:
> 
> On 06.09.2013, at 13:12, Alex Major <al3xdm@gmail.com> wrote:
> 
> > Have you changed the appropriate config settings so that Cassandra will run with
only 2GB RAM? You shouldn't find the nodes go down.
> >
> > Check out this blog post http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/
, it outlines the configuration settings needed to run Cassandra on 64MB RAM and might give
you some insights.
> 
> Yes, I have my fingers on the knobs and have also seen the article you mention - very
helpful indeed. As well as the replies so far. Thanks very much.
> 
> However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my data import
:-(
> 
> Now, while it would be easy to scale out and up a bit until the default config of C*
is sufficient, I really like to dive deep and try to understand why the thing is still going
down, IOW, which of my config settings is so darn wrong that in most cases kill -9 remains
the only way to shutdown the Java process in the end.
> 
> 
> The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M"   and HEAP_NEWSIZE="120M"
) in combination with some cassandra activity that demands too much heap, right?
> 
> So how do I find out what activity this is and how do I sufficiently reduce that activity.
> 
> What bugs me in general is that AFAIU C* is so eager at giving massive write speed, that
it sort of forgets to protect itself from client demand. I would very much like to understand
why and how that happens.  I mean: no matter how many clients are flooding the database, it
should not die due to out of memory situations, regardless of any configuration specifics,
or?
> 
> 
> tl;dr
> 
> Currently my client side (with java-driver) after a while reports more and more timeouts
and then the following exception:
> 
> com.datastax.driver.core.ex
> ceptions.DriverInternalError: An unexpected error occured server side: java.lang.OutOfMemoryError:
unable
> to create new native thread ;
> 
> On the server side, my cluster remains more or less in this condition:
> 
> DN  xxxxx     71,33 MB   256     34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  rack1
> UN  xxxxx  189,38 MB  256     32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  rack1
> UN  xxxxx    198,49 MB  256     33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  rack1
> 
> The host that is down (it is the seed host, if that matters) still shows the running
java process, but I cannot shut down cassandra or connect with nodetool, hence kill -9 to
the rescue.
> 
> In that host, I still see a load of around 1.
> 
> jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
> 
> 
> The system.log after a few seconds of import shows the following exception:
> 
> java.lang.AssertionError: incorrect row data size 771030 written to /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db;
correct is 771200
>         at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>         at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>         at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> 
> 
> And then, after about 2 minutes there are out of memory errors:
> 
>  ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java (line 192)
Exception in thread Thread[CompactionExecutor
> :5,1,main]
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:693)
>         at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296)
>         at org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
>         at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
>         at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java (line 192)
Exception in thread Thread[CompactionExecutor:
> 
> 
> On the other hosts the log looks similar, but these keep running, desipte the OutOfMemory
Errors.
> 
> 
> 
> 
> Jan
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> >
> >
> > On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen <jan.algermissen@nordsc.com>
wrote:
> > Hi,
> >
> > I have set up C* in a very limited environment: 3 VMs at digitalocean with 2GB RAM
and 40GB SSDs, so my expectations about overall performance are low.
> >
> > Keyspace uses replication level of 2.
> >
> > I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small texts,
300.000 wide rows effektively) in a quite 'agressive' way, using java-driver and async update
statements.
> >
> > After a while of importing data, I start seeing timeouts reported by the driver:
> >
> > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during
write query at consistency ONE (1 replica were required but only 0 acknowledged the write
> >
> > and then later, host-unavailability exceptions:
> >
> > com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available
for query at consistency ONE (1 required but only 0 alive).
> >
> > Looking at the 3 hosts, I see two C*s went down - which explains that I still see
some writes succeeding (that must be the one host left, satisfying the consitency level ONE).
> >
> >
> > The logs tell me AFAIU that the servers shutdown due to reaching the heap size limit.
> >
> > I am irritated by the fact that the instances (it seems) shut themselves down instead
of limiting their amount of work. I understand that I need to tweak the configuration and
likely get more RAM, but still, I would actually be satisfied with reduced service (and likely
more timeouts in the client).  Right now it looks as if I would have to slow down the client
'artificially'  to prevent the loss of hosts - does that make sense?
> >
> > Can anyone explain whether this is intended behavior, meaning I'll just have to
accept the self-shutdown of the hosts? Or alternatively, what data I should collect to investigate
the cause further?
> >
> > Jan
> >
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> 
> - John


Mime
View raw message