cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9573) OOM when loading compressed sstables (system.hints)
Date Wed, 10 Jun 2015 22:53:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581188#comment-14581188
] 

Benedict commented on CASSANDRA-9573:
-------------------------------------

I suspect we are catching the mmapped files via mlockall. We now map them on construction
(instead of re-mapping them repeatedly for each reader), but our preflight checks appear to
load the system tables prior to mlockall, so all system sstables present during startup would
be forced to remain resident.

It's proving difficult to reproduce locally for reasons unrelated to this ticket. Could you
try running your script again with JNA disabled, and see if it behaves?

> OOM when loading compressed sstables (system.hints)
> ---------------------------------------------------
>
>                 Key: CASSANDRA-9573
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9573
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alan Boudreault
>            Assignee: Benedict
>            Priority: Critical
>             Fix For: 2.2.0 rc2
>
>         Attachments: hs_err_pid11243.log, java-hints-issue-2015-06-09.snapshot, system.log,
yourkit.ss.tar.gz
>
>
> [~andrew.tolbert] discovered an issue while running endurance tests on 2.2. A Node was
not able to start and was killed by the OOM Killer.
> Briefly, Cassandra use an excessive amount of memory when loading compressed sstables
(off-heap?). We have initially seen the issue with system.hints before knowing it was related
to compression. system.hints use lz4 compression by default. If we have a sstable of, say
8-10G, Cassandra will be killed by the OOM killer after 1-2 minutes. I can reproduce that
bug everytime locally. 
> * the issue also happens if we have 10G of data splitted in 13MB sstables.
> * I can reproduce the issue if I put a lot of data in the system.hints table.
> * I cannot reproduce the issue with a standard table using the same compression (LZ4).
Something seems to be different when it's hints?
> You wont see anything in the node system.log but you'll see this in /var/log/syslog.log:
> {code}
> Out of memory: Kill process 30777 (java) score 600 or sacrifice child
> {code}
> The issue has been introduced in this commit but is not related to the performance issue
in CASSANDRA-9240: https://github.com/apache/cassandra/commit/aedce5fc6ba46ca734e91190cfaaeb23ba47a846
> Here is the core dump and some yourkit snapshots in attachments. I am not sure you will
be able to get useful information from them.
> core dump: http://dl.alanb.ca/core.tar.gz
> Not sure if this is related, but all dumps and snapshot points to EstimatedHistogramReservoir
... and we can see many javax.management.InstanceAlreadyExistsException: org.apache.cassandra.metrics:...
exceptions in system.log before it hangs then crash.
> To reproduce the issue: 
> 1. created a cluster of 3 nodes
> 2. start the whole cluster
> 3. shutdown node2 and node3
> 4. writes 10-15G of data on node1 with replication factor 3. You should see a lot of
hints.
> 5. stop node1
> 6. start node2 and node3
> 7. start node1, you should OOM.
> //cc [~tjake] [~benedict] [~andrew.tolbert]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message