Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3BD20176E4 for ; Mon, 27 Oct 2014 17:00:39 +0000 (UTC) Received: (qmail 99887 invoked by uid 500); 27 Oct 2014 17:00:39 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 99855 invoked by uid 500); 27 Oct 2014 17:00:39 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 99844 invoked by uid 99); 27 Oct 2014 17:00:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Oct 2014 17:00:39 +0000 Date: Mon, 27 Oct 2014 17:00:38 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8167) sstablesplit tool can be made much faster with few JVM settings MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185391#comment-14185391 ] Jonathan Ellis commented on CASSANDRA-8167: ------------------------------------------- Do you have a heap dump from the OOM? > sstablesplit tool can be made much faster with few JVM settings > --------------------------------------------------------------- > > Key: CASSANDRA-8167 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8167 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Nikolai Grigoriev > Priority: Trivial > > I had to use sstablesplit tool intensively to split some really huge sstables. The tool is painfully slow as it does compaction in one single thread. > I have just found that one one of my machines the tool has crashed when I was almost done with 152Gb sstable (!!!). > {code} > INFO 16:59:22,342 Writing Memtable-compactions_in_progress@1948660572(0/0 serialized/live bytes, 1 ops) > INFO 16:59:22,352 Completed flushing /cassandra-data/disk1/system/compactions_in_progress/system-compactions_in_progress-jb-79242-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1413904450653, position=69178) > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) > at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:586) > at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) > at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) > at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:36) > at org.apache.cassandra.db.RangeTombstoneList$InOrderTester.isDeleted(RangeTombstoneList.java:751) > at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(DeletionInfo.java:422) > at org.apache.cassandra.db.DeletionInfo$InOrderTester.isDeleted(DeletionInfo.java:403) > at org.apache.cassandra.db.ColumnFamily.hasIrrelevantData(ColumnFamily.java:489) > at org.apache.cassandra.db.compaction.PrecompactedRow.removeDeleted(PrecompactedRow.java:66) > at org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:85) > at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196) > at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74) > at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55) > at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:204) > at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:154) > at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) > at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > at org.apache.cassandra.db.compaction.SSTableSplitter.split(SSTableSplitter.java:38) > at org.apache.cassandra.tools.StandaloneSplitter.main(StandaloneSplitter.java:150) > {code} > This has triggered my desire to see what memory settings are used for JVM running the tool...and I have found that it runs with default Java settings (no settings at all). > I have tried to apply the settings from C* itself and this resulted in over 40% speed increase. It went from ~5Mb/s to 7Mb/s - from the compressed output perspective. I believe this is mostly due to concurrent GC. I see my CPU usage has increased to ~200%. But this is fine, this is an offline tool, the node is down anyway. I know that concurrent GC (at least something like -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled) normally improves the performance of even primitive single-threaded heap-intensive Java programs. > I think it should be acceptable to apply the server JVM settings to this tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)