cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8167) sstablesplit tool can be made much faster with few JVM settings
Date Thu, 13 Oct 2016 15:37:20 GMT


Joshua McKenzie commented on CASSANDRA-8167:

[~yukim]: {{tools/bin/}} contains settings we source in our batch files for
tools, so we should add the settings in there.

> sstablesplit tool can be made much faster with few JVM settings
> ---------------------------------------------------------------
>                 Key: CASSANDRA-8167
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Nikolai Grigoriev
>            Assignee: Yuki Morishita
>            Priority: Trivial
> I had to use sstablesplit tool intensively to split some really huge sstables. The tool
is painfully slow as it does compaction in one single thread.
> I have just found that one one of my machines the tool has crashed when I was almost
done with 152Gb sstable (!!!). 
> {code}
>  INFO 16:59:22,342 Writing Memtable-compactions_in_progress@1948660572(0/0 serialized/live
bytes, 1 ops)
>  INFO 16:59:22,352 Completed flushing /cassandra-data/disk1/system/compactions_in_progress/system-compactions_in_progress-jb-79242-Data.db
(42 bytes) for commitlog position ReplayPosition(segmentId=1413904450653, position=69178)
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.nio.HeapByteBuffer.duplicate(
>         at org.apache.cassandra.utils.ByteBufferUtil.readBytes(
>         at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(
>         at
>         at
>         at org.apache.cassandra.db.RangeTombstoneList$InOrderTester.isDeleted(
>         at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(
>         at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(
>         at org.apache.cassandra.db.ColumnFamily.hasIrrelevantData(
>         at org.apache.cassandra.db.compaction.PrecompactedRow.removeDeleted(
>         at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(
>         at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(
>         at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(
>         at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(
>         at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(
>         at
>         at
>         at org.apache.cassandra.db.compaction.CompactionTask.runWith(
>         at
>         at
>         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(
>         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(
>         at org.apache.cassandra.db.compaction.SSTableSplitter.split(
>         at
> {code}
> This has  triggered my desire to see what memory settings are used for JVM running the
tool...and I have found that it runs with default Java settings (no settings at all).
> I have tried to apply the settings from C* itself and this resulted in over 40% speed
increase. It went from ~5Mb/s to 7Mb/s - from the compressed output perspective. I believe
this is mostly due to concurrent GC. I see my CPU usage has increased to ~200%. But this is
fine, this is an offline tool, the node is down anyway. I know that concurrent GC (at least
something like -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled) normally
improves the performance of even primitive single-threaded heap-intensive Java programs.
> I think it should be acceptable to apply the server JVM settings to this tool.

This message was sent by Atlassian JIRA

View raw message