cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom van der Woerdt (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-12114) Cassandra startup takes an hour because of N*N operation
Date Thu, 30 Jun 2016 16:41:10 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-12114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tom van der Woerdt updated CASSANDRA-12114:
-------------------------------------------
    Description: 
(There's a previous version of this ticket, which was very wrong about the actual cause. Original
is quoted below)

In java.org.cassandra.db.ColumnFamilyStore, the function scrubDataDirectories loops over all
sstables and then for each sstable it cleans temporary files from its directory.

Since there are many sstables in a directory, this ends up cleaning the same directory many
times.

When using leveledcompactionstrategy on a data set that is ~4TB per node, you can easily end
up with 200k files.

Add N and N, and we get a N*N operation (scrubDataDirectories) which ends up taking an hour
(or more).

(At this point I should probably point out that no, I am not sure about that. At all. But
I do know this takes an hour and jstack blames this function)

As promised, original ticket below :

{quote}
A Cassandra cluster of ours has nodes with up to 4TB of data, in a single table using leveled
compaction having 200k files. While upgrading from 2.2.6 to 3.0.7 we noticed that it took
a while to restart a node. And with "a while" I mean we measured it at more than 60 minutes.

jstack shows something interesting :
{code}
"main" #1 prio=5 os_prio=0 tid=0x00007f30db0ea400 nid=0xdb22 runnable [0x00007f30de122000]
   java.lang.Thread.State: RUNNABLE
    at java.io.UnixFileSystem.list(Native Method)
    at java.io.File.list(File.java:1122)
    at java.io.File.listFiles(File.java:1248)
    at org.apache.cassandra.io.sstable.Descriptor.getTemporaryFiles(Descriptor.java:172)
    at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:599)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:245)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557)
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685)
{code}

Going by the source of File.listFiles, it puts every file in a directory into an array and
*then* applies the filter.

This is actually a known Java issue from 1999: http://bugs.java.com/view_bug.do?bug_id=4285834
-- their "solution" was to introduce new APIs in JRE7. I guess that makes listFiles deprecated
for larger directories (like when using LeveledCompactionStrategy).


tl;dr: because Cassandra uses java.io.File.listFiles, service startup can take an hour for
larger data sets.
{quote}

  was:
(There's a previous version of this ticket, which was very wrong about the actual cause. Original
is quoted below)

In java.org.cassandra.db.ColumnFamilyStore, the function scrubDataDirectories loops over all
sstables and then for each sstable it cleans temporary files from its directory.

Since there are many sstables in a directory, this ends up cleaning the same directory many
times.

When using leveledcompactionstrategy on a data set that is ~4TB per node, you can easily end
up with 200k files.

Add N and N, and we get a N*N operation (scrubDataDirectories) which ends up taking an hour
(or more).

(At this point I should probably point out that no, I am not sure about that. At all. But
I do know that jstack says this takes an hour :) )

As promised, original ticket below :

{quote}
A Cassandra cluster of ours has nodes with up to 4TB of data, in a single table using leveled
compaction having 200k files. While upgrading from 2.2.6 to 3.0.7 we noticed that it took
a while to restart a node. And with "a while" I mean we measured it at more than 60 minutes.

jstack shows something interesting :
{code}
"main" #1 prio=5 os_prio=0 tid=0x00007f30db0ea400 nid=0xdb22 runnable [0x00007f30de122000]
   java.lang.Thread.State: RUNNABLE
    at java.io.UnixFileSystem.list(Native Method)
    at java.io.File.list(File.java:1122)
    at java.io.File.listFiles(File.java:1248)
    at org.apache.cassandra.io.sstable.Descriptor.getTemporaryFiles(Descriptor.java:172)
    at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:599)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:245)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557)
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685)
{code}

Going by the source of File.listFiles, it puts every file in a directory into an array and
*then* applies the filter.

This is actually a known Java issue from 1999: http://bugs.java.com/view_bug.do?bug_id=4285834
-- their "solution" was to introduce new APIs in JRE7. I guess that makes listFiles deprecated
for larger directories (like when using LeveledCompactionStrategy).


tl;dr: because Cassandra uses java.io.File.listFiles, service startup can take an hour for
larger data sets.
{quote}


> Cassandra startup takes an hour because of N*N operation
> --------------------------------------------------------
>
>                 Key: CASSANDRA-12114
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12114
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tom van der Woerdt
>
> (There's a previous version of this ticket, which was very wrong about the actual cause.
Original is quoted below)
> In java.org.cassandra.db.ColumnFamilyStore, the function scrubDataDirectories loops over
all sstables and then for each sstable it cleans temporary files from its directory.
> Since there are many sstables in a directory, this ends up cleaning the same directory
many times.
> When using leveledcompactionstrategy on a data set that is ~4TB per node, you can easily
end up with 200k files.
> Add N and N, and we get a N*N operation (scrubDataDirectories) which ends up taking an
hour (or more).
> (At this point I should probably point out that no, I am not sure about that. At all.
But I do know this takes an hour and jstack blames this function)
> As promised, original ticket below :
> {quote}
> A Cassandra cluster of ours has nodes with up to 4TB of data, in a single table using
leveled compaction having 200k files. While upgrading from 2.2.6 to 3.0.7 we noticed that
it took a while to restart a node. And with "a while" I mean we measured it at more than 60
minutes.
> jstack shows something interesting :
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007f30db0ea400 nid=0xdb22 runnable [0x00007f30de122000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.UnixFileSystem.list(Native Method)
>     at java.io.File.list(File.java:1122)
>     at java.io.File.listFiles(File.java:1248)
>     at org.apache.cassandra.io.sstable.Descriptor.getTemporaryFiles(Descriptor.java:172)
>     at org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:599)
>     at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:245)
>     at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557)
>     at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685)
> {code}
> Going by the source of File.listFiles, it puts every file in a directory into an array
and *then* applies the filter.
> This is actually a known Java issue from 1999: http://bugs.java.com/view_bug.do?bug_id=4285834
-- their "solution" was to introduce new APIs in JRE7. I guess that makes listFiles deprecated
for larger directories (like when using LeveledCompactionStrategy).
> tl;dr: because Cassandra uses java.io.File.listFiles, service startup can take an hour
for larger data sets.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message