cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolai Grigoriev (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-8167) sstablesplit tool can be made much faster with few JVM settings
Date Wed, 22 Oct 2014 19:39:33 GMT
Nikolai Grigoriev created CASSANDRA-8167:
--------------------------------------------

             Summary: sstablesplit tool can be made much faster with few JVM settings
                 Key: CASSANDRA-8167
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8167
             Project: Cassandra
          Issue Type: Improvement
          Components: Tools
            Reporter: Nikolai Grigoriev
            Priority: Trivial


I had to use sstablesplit tool intensively to split some really huge sstables. The tool is
painfully slow as it does compaction in one single thread.

I have just found that one one of my machines the tool has crashed when I was almost done
with 152Gb sstable (!!!). 

{code}
 INFO 16:59:22,342 Writing Memtable-compactions_in_progress@1948660572(0/0 serialized/live
bytes, 1 ops)
 INFO 16:59:22,352 Completed flushing /cassandra-data/disk1/system/compactions_in_progress/system-compactions_in_progress-jb-79242-Data.db
(42 bytes) for commitlog position ReplayPosition(segmentId=1413904450653, position=69178)
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)
        at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:586)
        at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
        at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61)
        at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36)
        at org.apache.cassandra.db.RangeTombstoneList$InOrderTester.isDeleted(RangeTombstoneList.java:751)
        at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(DeletionInfo.java:422)
        at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(DeletionInfo.java:403)
        at org.apache.cassandra.db.ColumnFamily.hasIrrelevantData(ColumnFamily.java:489)
        at org.apache.cassandra.db.compaction.PrecompactedRow.removeDeleted(PrecompactedRow.java:66)
        at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:85)
        at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
        at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
        at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
        at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:204)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:154)
        at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
        at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
        at org.apache.cassandra.db.compaction.SSTableSplitter.split(SSTableSplitter.java:38)
        at org.apache.cassandra.tools.StandaloneSplitter.main(StandaloneSplitter.java:150)

{code}

This has  triggered my desire to see what memory settings are used for JVM running the tool...and
I have found that it runs with default Java settings (no settings at all).

I have tried to apply the settings from C* itself and this resulted in over 40% speed increase.
It went from ~5Mb/s to 7Mb/s - from the compressed output perspective. I believe this is mostly
due to concurrent GC. I see my CPU usage has increased to ~200%. But this is fine, this is
an offline tool, the node is down anyway. I know that concurrent GC (at least something like
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled) normally improves
the performance of even primitive single-threaded heap-intensive Java programs.

I think it should be acceptable to apply the server JVM settings to this tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message