cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-10821) OOM Killer terminates Cassandra when Compactions use too much memory then won't restart
Date Fri, 22 Jul 2016 13:03:20 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis resolved CASSANDRA-10821.
----------------------------------------
    Resolution: Won't Fix

Compaction is often described as, "worst case you will need 2x disk space while it writes
out new data before it can clean up the old," but you can also need 2x RAM for the off-heap
compression metadata and bloom filters.

Your best bet is probably to disable bloom filters until this compaction finishes.  Switching
to more aggressive compression may also help.

> OOM Killer terminates Cassandra when Compactions use too much memory then won't restart
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10821
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10821
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>         Environment: EC2 32 x i2.xlarge split between us-east-1a,c and us-west 2a,b
> Linux  4.1.10-17.31.amzn1.x86_64 #1 SMP Sat Oct 24 01:31:37 UTC 2015 x86_64 x86_64 x86_64
GNU/Linux
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
> Cassandra version: 2.2.3
>            Reporter: Thom Bartold
>
> We were writing to the DB from EC2 instances in us-east-1 at a rate of about 3000 per
second, replication us-east:2 us-west:2, LeveledCompaction and DeflateCompressor.
> After about 48 hours some nodes had over 800 pending compactions and a few of them started
getting killed for Linux OOM. Priam attempts to restart the nodes, but they fail because of
corrupted saved_cahce files.
> Loading has finished, and the cluster is mostly idle, but 6 of the nodes were killed
again last night by OOM.
> This is the log message where the node won't restart:
> ERROR [main] 2015-12-05 13:59:13,754 CassandraDaemon.java:635 - Detected unreadable sstables
/media/ephemeral0/cassandra/saved_caches/KeyCache-ca.db, please check NEWS.txt and ensure
that you have upgraded through all required intermediate versions, running upgradesstables
> This is the dmesg where the node is terminated:
> [360803.234422] Out of memory: Kill process 10809 (java) score 949 or sacrifice child
> [360803.237544] Killed process 10809 (java) total-vm:438484092kB, anon-rss:29228012kB,
file-rss:107576kB
> This is what Compaction Stats look like currently:
> pending tasks: 1096
>                                      id   compaction type          keyspace      table
   completed          total    unit   progress
>    93eb3200-9b58-11e5-b9f1-ffef1041ec45        Compaction   overlordpreprod   document
  8670748796   839129219651   bytes      1.03%
>                                                Compaction            system      hints
          30     1921326518   bytes      0.00%
> Active compaction remaining time :  27h33m47s
> Only 6 of the 32 nodes have compactions pending, and all on the order of 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message