cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Petracca (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-11623) Compactions w/ Short Rows Spending Time in getOnDiskFilePointer
Date Wed, 20 Apr 2016 18:33:25 GMT
Tom Petracca created CASSANDRA-11623:
----------------------------------------

             Summary: Compactions w/ Short Rows Spending Time in getOnDiskFilePointer
                 Key: CASSANDRA-11623
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11623
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Tom Petracca
            Priority: Minor
         Attachments: compactiontask_profile.png

Been doing some performance tuning and profiling of my cassandra cluster and noticed that
compaction speeds for my tables that I know to have very short rows were going particularly
slowly.  Profiling shows a ton of time being spent in BigTableWriter.getOnDiskFilePointer(),
and attaching strace to a CompactionTask shows that a majority of time is being spent lseek
(called by getOnDiskFilePointer), and not read or write.

Going deeper it looks like we call getOnDiskFilePointer each row (sometimes multiple times
per row) in order to see if we've reached our expected sstable size and should start a new
writer.  This is pretty unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message