cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Constance Eustace (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9640) Nodetool repair of very wide, large rows causes GC pressure and destabilization
Date Wed, 24 Jun 2015 20:38:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600078#comment-14600078
] 

Constance Eustace edited comment on CASSANDRA-9640 at 6/24/15 8:37 PM:
-----------------------------------------------------------------------

attached syslog.zip....

the destabilization occurs around the end of _0040 and continues into the next logfile

I suspect that multiple huge/wide partition keys are being resolved in parallel, and that
may be filling the heap.

entity_etljob is the processing table that has the ultra-huge rows



was (Author: cowardlydragon):
attached syslog.zip....

the destabilization occurs around the end of _0040 and continues into the next logfile

I suspect that multiple huge/wide partition keys are being resolved in parallel, and that
may be filling the heap.


> Nodetool repair of very wide, large rows causes GC pressure and destabilization
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9640
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9640
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: AWS, ~8GB heap
>            Reporter: Constance Eustace
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 2.1.x
>
>         Attachments: syslog.zip
>
>
> We've noticed our nodes becoming unstable with large, unrecoverable Old Gen GCs until
OOM.
> This appears to be around the time of repair, and the specific cause seems to be one
of our report computation tables that involves possible very wide rows with 10GB of data in
it. THis is an RF 3 table in a four-node cluster.
> We truncate this occasionally, and we also had disabled this computation report for a
bit and noticed better node stabiliy.
> I wish I had more specifics. We are switching to an RF 1 table and do more proactive
truncation of the table. 
> When things calm down, we will attempt to replicate the issue and watch GC and other
logs.
> Any suggestion for things to look for/enable tracing on would be welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message