cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-9573) OOM when loading sstables (system.hints)
Date Thu, 11 Jun 2015 13:54:01 GMT


Aleksey Yeschenko updated CASSANDRA-9573:
    Assignee: Sam Tunnicliffe  (was: Benedict)

> OOM when loading sstables (system.hints)
> ----------------------------------------
>                 Key: CASSANDRA-9573
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alan Boudreault
>            Assignee: Sam Tunnicliffe
>            Priority: Critical
>             Fix For: 2.2.0 rc2
>         Attachments: hs_err_pid11243.log, java-hints-issue-2015-06-09.snapshot, system.log,
> [~andrew.tolbert] discovered an issue while running endurance tests on 2.2. A Node was
not able to start and was killed by the OOM Killer.
> Briefly, Cassandra use an excessive amount of memory when loading compressed sstables
(off-heap?). We have initially seen the issue with system.hints before knowing it was related
to compression. system.hints use lz4 compression by default. If we have a sstable of, say
8-10G, Cassandra will be killed by the OOM killer after 1-2 minutes. I can reproduce that
bug everytime locally. 
> * the issue also happens if we have 10G of data splitted in 13MB sstables.
> * I can reproduce the issue if I put a lot of data in the system.hints table.
> * I cannot reproduce the issue with a standard table using the same compression (LZ4).
Something seems to be different when it's hints?
> You wont see anything in the node system.log but you'll see this in /var/log/syslog.log:
> {code}
> Out of memory: Kill process 30777 (java) score 600 or sacrifice child
> {code}
> The issue has been introduced in this commit but is not related to the performance issue
in CASSANDRA-9240:
> Here is the core dump and some yourkit snapshots in attachments. I am not sure you will
be able to get useful information from them.
> core dump:
> Not sure if this is related, but all dumps and snapshot points to EstimatedHistogramReservoir
... and we can see many org.apache.cassandra.metrics:...
exceptions in system.log before it hangs then crash.
> To reproduce the issue: 
> 1. created a cluster of 3 nodes
> 2. start the whole cluster
> 3. shutdown node2 and node3
> 4. writes 10-15G of data on node1 with replication factor 3. You should see a lot of
> 5. stop node1
> 6. start node2 and node3
> 7. start node1, you should OOM.
> //cc [~tjake] [~benedict] [~andrew.tolbert]

This message was sent by Atlassian JIRA

View raw message