cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files
Date Tue, 24 Apr 2012 21:54:07 GMT
Karl Mueller created CASSANDRA-4182:
---------------------------------------

             Summary: multithreaded compaction very slow with large single data file and a
few tiny data files
                 Key: CASSANDRA-4182
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 1.0.9
         Environment: Redhat
Sun JDK 1.6.0_20-b02
            Reporter: Karl Mueller
            Priority: Minor


Turning on multithreaded compaction makes compaction time take nearly twice as long in our
environment, which includes a very large SStable and a few smaller ones, relative to either
0.8.x with MT turned off or 1.0.x with MT turned off.  

compaction_throughput_mb_per_sec is set to 0.  

We currently compact about 500 GB of data nightly due to overwrites.  (LevelDB will probably
be enabled on the busy CFs once 1.0.x is rolled out completely)  The time it takes to do the
compaction is:

451m13.284s (multithreaded)
273m58.740s (multihtreaded disabled)

Our nodes run on SSDs and therefore have a high read and write rate available to them. The
primary CF they're compacting right now, with most of the data, is localized to a very large
file (~300+GB) and a few tiny files (1-10GB) since the CF has become far less active.  

I would expect the multithreaded compaction to be no worse than the single threaded compaction,
or perhaps a higher cost in CPU for the same performance, but it's half the speed with the
same CPU usage, or more CPU. 

I have two graphs available from testing 2 or 3 compactions which demonstrate some interesting
characteristics.  1.0.9 was installed on the 21st with MT turned on.  Prior stuff is 0.8.7
with MT turned off, but 1.0.9 with MT turned off seems to perform as well as 0.8.7.

http://www.xney.com/temp/cass-irq.png  (interrupts)

http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks)

This demonstrates a large increase in rescheduling interrupts and only half the bandwidth
used on the disks.  I suspect this is because some kind of threads are thrashing or something
like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message