cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12114) Cassandra startup takes an hour because of N*N operation
Date Wed, 20 Jul 2016 13:47:20 GMT


Aleksey Yeschenko commented on CASSANDRA-12114:

[~jjirsa] Are you going to commit this one?

> Cassandra startup takes an hour because of N*N operation
> --------------------------------------------------------
>                 Key: CASSANDRA-12114
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Tom van der Woerdt
>            Assignee: Jeff Jirsa
>             Fix For: 3.0.x, 3.x
> (There's a previous version of this ticket, which was very wrong about the actual cause.
Original is quoted below)
> In, the function scrubDataDirectories loops over
all sstables and then for each sstable it cleans temporary files from its directory.
> Since there are many sstables in a directory, this ends up cleaning the same directory
many times.
> When using leveledcompactionstrategy on a data set that is ~4TB per node, you can easily
end up with 200k files.
> Add N and N, and we get a N*N operation (scrubDataDirectories) which ends up taking an
hour (or more).
> (At this point I should probably point out that no, I am not sure about that. At all.
But I do know this takes an hour and jstack blames this function)
> As promised, original ticket below :
> {quote}
> A Cassandra cluster of ours has nodes with up to 4TB of data, in a single table using
leveled compaction having 200k files. While upgrading from 2.2.6 to 3.0.7 we noticed that
it took a while to restart a node. And with "a while" I mean we measured it at more than 60
> jstack shows something interesting :
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007f30db0ea400 nid=0xdb22 runnable [0x00007f30de122000]
>    java.lang.Thread.State: RUNNABLE
>     at Method)
>     at
>     at
>     at
>     at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(
>     at org.apache.cassandra.service.CassandraDaemon.setup(
>     at org.apache.cassandra.service.CassandraDaemon.activate(
>     at org.apache.cassandra.service.CassandraDaemon.main(
> {code}
> Going by the source of File.listFiles, it puts every file in a directory into an array
and *then* applies the filter.
> This is actually a known Java issue from 1999:
-- their "solution" was to introduce new APIs in JRE7. I guess that makes listFiles deprecated
for larger directories (like when using LeveledCompactionStrategy).
> tl;dr: because Cassandra uses, service startup can take an hour
for larger data sets.
> {quote}

This message was sent by Atlassian JIRA

View raw message