cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Eriksson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-6568) sstables incorrectly getting marked as "not live"
Date Thu, 06 Feb 2014 17:32:12 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcus Eriksson updated CASSANDRA-6568:
---------------------------------------

    Attachment: 0001-add-jmx-method-to-get-non-active-sstables.patch

Been trying to reproduce this today, but without any luck

Attaching patch that exposes a JMX method that shows the difference between files on disk
and the sstables Cassandra knows about, should perhaps help us find the issue.

> sstables incorrectly getting marked as "not live"
> -------------------------------------------------
>
>                 Key: CASSANDRA-6568
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6568
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 1.2.12 with several 1.2.13 patches
>            Reporter: Chris Burroughs
>            Assignee: Marcus Eriksson
>             Fix For: 1.2.15, 2.0.6
>
>         Attachments: 0001-add-jmx-method-to-get-non-active-sstables.patch
>
>
> {noformat}
> -rw-rw-r-- 14 cassandra cassandra 1.4G Nov 25 19:46 /data/sstables/data/ks/cf/ks-cf-ic-402383-Data.db
> -rw-rw-r-- 14 cassandra cassandra  13G Nov 26 00:04 /data/sstables/data/ks/cf/ks-cf-ic-402430-Data.db
> -rw-rw-r-- 14 cassandra cassandra  13G Nov 26 05:03 /data/sstables/data/ks/cf/ks-cf-ic-405231-Data.db
> -rw-rw-r-- 31 cassandra cassandra  21G Nov 26 08:38 /data/sstables/data/ks/cf/ks-cf-ic-405232-Data.db
> -rw-rw-r--  2 cassandra cassandra 2.6G Dec  3 13:44 /data/sstables/data/ks/cf/ks-cf-ic-434662-Data.db
> -rw-rw-r-- 14 cassandra cassandra 1.5G Dec  5 09:05 /data/sstables/data/ks/cf/ks-cf-ic-438698-Data.db
> -rw-rw-r--  2 cassandra cassandra 3.1G Dec  6 12:10 /data/sstables/data/ks/cf/ks-cf-ic-440983-Data.db
> -rw-rw-r--  2 cassandra cassandra  96M Dec  8 01:52 /data/sstables/data/ks/cf/ks-cf-ic-444041-Data.db
> -rw-rw-r--  2 cassandra cassandra 3.3G Dec  9 16:37 /data/sstables/data/ks/cf/ks-cf-ic-451116-Data.db
> -rw-rw-r--  2 cassandra cassandra 876M Dec 10 11:23 /data/sstables/data/ks/cf/ks-cf-ic-453552-Data.db
> -rw-rw-r--  2 cassandra cassandra 891M Dec 11 03:21 /data/sstables/data/ks/cf/ks-cf-ic-454518-Data.db
> -rw-rw-r--  2 cassandra cassandra 102M Dec 11 12:27 /data/sstables/data/ks/cf/ks-cf-ic-455429-Data.db
> -rw-rw-r--  2 cassandra cassandra 906M Dec 11 23:54 /data/sstables/data/ks/cf/ks-cf-ic-455533-Data.db
> -rw-rw-r--  1 cassandra cassandra 214M Dec 12 05:02 /data/sstables/data/ks/cf/ks-cf-ic-456426-Data.db
> -rw-rw-r--  1 cassandra cassandra 203M Dec 12 10:49 /data/sstables/data/ks/cf/ks-cf-ic-456879-Data.db
> -rw-rw-r--  1 cassandra cassandra  49M Dec 12 12:03 /data/sstables/data/ks/cf/ks-cf-ic-456963-Data.db
> -rw-rw-r-- 18 cassandra cassandra  20G Dec 25 01:09 /data/sstables/data/ks/cf/ks-cf-ic-507770-Data.db
> -rw-rw-r--  3 cassandra cassandra  12G Jan  8 04:22 /data/sstables/data/ks/cf/ks-cf-ic-567100-Data.db
> -rw-rw-r--  3 cassandra cassandra 957M Jan  8 22:51 /data/sstables/data/ks/cf/ks-cf-ic-569015-Data.db
> -rw-rw-r--  2 cassandra cassandra 923M Jan  9 17:04 /data/sstables/data/ks/cf/ks-cf-ic-571303-Data.db
> -rw-rw-r--  1 cassandra cassandra 821M Jan 10 08:20 /data/sstables/data/ks/cf/ks-cf-ic-574642-Data.db
> -rw-rw-r--  1 cassandra cassandra  18M Jan 10 08:48 /data/sstables/data/ks/cf/ks-cf-ic-574723-Data.db
> {noformat}
> I tried to do a user defined compaction on sstables from November and got "it is not
an active sstable".  Live sstable count from jmx was about 7 while on disk there were over
20.  Live vs total size showed about a ~50 GiB difference.
> Forcing a gc from jconsole had no effect.  However, restarting the node resulted in live
sstables/bytes *increasing* to match what was on disk.  User compaction could now compact
the November sstables.  This cluster was last restarted in mid December.
> I'm not sure what affect "not live" had on other operations of the cluster.  From the
logs it seems that the files were sent at least at some point as part of repair, but I don't
know if they were being being used for read requests or not.  Because the problem that got
me looking in the first place was poor performance I suspect they were  used for reads (and
the reads were slow because so many sstables were being read).  I presume based on their age
at the least they were being excluded from compaction.
> I'm not aware of any isLive() or getRefCount() to problematically confirm which nodes
have this problem.  In this cluster almost all columns have a 14 day TTL, based on the number
of nodes with November sstables it appears to be occurring on a significant fraction of the
nodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message