cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Roth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12730) Thousands of empty SSTables created during repair - TMOF death
Date Tue, 08 Nov 2016 11:11:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647254#comment-15647254
] 

Benjamin Roth commented on CASSANDRA-12730:
-------------------------------------------

Thanks for your response.

My proposal was driven by the assumption that the session prepare causes most of these flushes,
but after reading some more code I also recognized that there are a lot more occasions where
flushes may occur - e.g. initiating a validation compaction.
So probably I will first have to log flushes with traces to see where all these flushes come
from. Unfortunately I am currentyl on Web Summit with poor connectivity, so I dont know when
I will be able to set up logging / graphing for switches and being able to produce an appropriate
case.

The info you asked for:
- 7 Nodes
- 1 DC
- VNodes (256 tokens per node)
- Repair is managed by reaper with parallel, full, subrange (around 3500 ranges)
- RF 3

But I also got a question: 
If a repair triggers a stream, isn't it going through the regular write path like any other
mutation (just to take care of index updates, MVs, ...)? To me it also looks like that when
looking at StreamReceiveTask. If so, many small streams should not create many small SSTables
if a flush is not forced by other circumstances.

Thanks for your help!

> Thousands of empty SSTables created during repair - TMOF death
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-12730
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12730
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Benjamin Roth
>            Priority: Critical
>
> Last night I ran a repair on a keyspace with 7 tables and 4 MVs each containing a few
hundret million records. After a few hours a node died because of "too many open files".
> Normally one would just raise the limit, but: We already set this to 100k. The problem
was that the repair created roughly over 100k SSTables for a certain MV. The strange thing
is that these SSTables had almost no data (like 53bytes, 90bytes, ...). Some of them (<5%)
had a few 100 KB, very few (<1% had normal sizes like >= few MB). I could understand,
that SSTables queue up as they are flushed and not compacted in time but then they should
have at least a few MB (depending on config and avail mem), right?
> Of course then the node runs out of FDs and I guess it is not a good idea to raise the
limit even higher as I expect that this would just create even more empty SSTables before
dying at last.
> Only 1 CF (MV) was affected. All other CFs (also MVs) behave sanely. Empty SSTables have
been created equally over time. 100-150 every minute. Among the empty SSTables there are also
Tables that look normal like having few MBs.
> I didn't see any errors or exceptions in the logs until TMOF occured. Just tons of streams
due to the repair (which I actually run over cs-reaper as subrange, full repairs).
> After having restarted that node (and no more repair running), the number of SSTables
went down again as they are compacted away slowly.
> According to [~zznate] this issue may relate to CASSANDRA-10342 + CASSANDRA-8641



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message