incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sanda <john.sa...@gmail.com>
Subject Re: Cassandra crashes
Date Mon, 09 Sep 2013 23:06:49 GMT
Check your file limits -
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html

On Friday, September 6, 2013, Jan Algermissen wrote:

>
> On 06.09.2013, at 13:12, Alex Major <al3xdm@gmail.com <javascript:;>>
> wrote:
>
> > Have you changed the appropriate config settings so that Cassandra will
> run with only 2GB RAM? You shouldn't find the nodes go down.
> >
> > Check out this blog post
> http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/,
it outlines the configuration settings needed to run Cassandra on 64MB
> RAM and might give you some insights.
>
> Yes, I have my fingers on the knobs and have also seen the article you
> mention - very helpful indeed. As well as the replies so far. Thanks very
> much.
>
> However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my
> data import :-(
>
> Now, while it would be easy to scale out and up a bit until the default
> config of C* is sufficient, I really like to dive deep and try to
> understand why the thing is still going down, IOW, which of my config
> settings is so darn wrong that in most cases kill -9 remains the only way
> to shutdown the Java process in the end.
>
>
> The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M"   and
> HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that
> demands too much heap, right?
>
> So how do I find out what activity this is and how do I sufficiently
> reduce that activity.
>
> What bugs me in general is that AFAIU C* is so eager at giving massive
> write speed, that it sort of forgets to protect itself from client demand.
> I would very much like to understand why and how that happens.  I mean: no
> matter how many clients are flooding the database, it should not die due to
> out of memory situations, regardless of any configuration specifics, or?
>
>
> tl;dr
>
> Currently my client side (with java-driver) after a while reports more and
> more timeouts and then the following exception:
>
> com.datastax.driver.core.ex
> ceptions.DriverInternalError: An unexpected error occured server side:
> java.lang.OutOfMemoryError: unable
> to create new native thread ;
>
> On the server side, my cluster remains more or less in this condition:
>
> DN  xxxxx     71,33 MB   256     34,1%
>  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  rack1
> UN  xxxxx  189,38 MB  256     32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f
>  rack1
> UN  xxxxx    198,49 MB  256     33,9%
>  0c2931a9-6582-48f2-b65a-e406e0bf1e56  rack1
>
> The host that is down (it is the seed host, if that matters) still shows
> the running java process, but I cannot shut down cassandra or connect with
> nodetool, hence kill -9 to the rescue.
>
> In that host, I still see a load of around 1.
>
> jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
>
>
> The system.log after a few seconds of import shows the following exception:
>
> java.lang.AssertionError: incorrect row data size 771030 written to
> /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db;
> correct is 771200
>         at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>         at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
>
> And then, after about 2 minutes there are out of memory errors:
>
>  ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java
> (line 192) Exception in thread Thread[CompactionExecutor
> :5,1,main]
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:693)
>         at
> org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296)
>         at
> org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
>         at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java
> (line 192) Exception in thread Thread[CompactionExecutor:
>
>
> On the other hosts the log looks similar, but these keep running, desipte
> the OutOfMemory Errors.
>
>
>
>
> Jan
>
>
>
>
>
>
>
>
>
>
>
>
>
> >
> >
> > On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen <
> jan.algermissen@nordsc.com <javascript:;>> wrote:
> > Hi,
> >
> > I have set up C* in a very limited environment: 3 VMs at digitalocean
> with 2GB RAM and 40GB SSDs, so my expectations about overall performance
> are low.
> >
> > Keyspace uses replication level of 2.
> >
> > I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small
> texts, 300.000 wide rows effektively) in a quite 'agressive' way, using
> java-driver and async update statements.
> >
> > After a while of importing data, I start seeing timeouts reported by the
> driver:
> >
> > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
> timeout during write query at consistency ONE (1 replica were required but
> only 0 acknowledged the write
> >
> > and then later, host-unavailability exceptions:
> >
> > com.datastax.driver.core.exceptions.UnavailableException: Not enough
> replica available for query at consistency ONE (1 required but only 0
> alive).
> >
> > Looking at the 3 hosts, I see two C*s went down - which explains that I
> still see some writes succeeding (that must be the one host left,
> satisfying the consitency level ONE).
> >
> >
> > The logs tell me AFAIU that the servers shutdown due to reaching the
> heap size limit.
> >
> > I am irritated by the fact that the instances (it seems) shut themselves
> down instead of limiting their amount of work. I understand that I need to
> tweak the configuration and likely get more RAM, but still, I would
> actually be satisfied with reduced service (and likely more timeouts in the
> client).  Right now it looks as if I would have to slow down the client
> 'artificially'  to prevent the loss of hosts - does that make sense?
> >
> > Can anyone explain whether this is intended behavior, meaning I'll just
> have to accept the self-shutdown of the hosts? Or alternatively, what data
> I should collect to investigate the cause further?
> >
> > Jan
> >
> >
> >
> >
> >
> >
>
>

-- 

- John

Mime
View raw message