Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52C3D11347 for ; Thu, 31 Jul 2014 17:55:39 +0000 (UTC) Received: (qmail 47568 invoked by uid 500); 31 Jul 2014 17:55:39 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 47532 invoked by uid 500); 31 Jul 2014 17:55:39 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 47510 invoked by uid 99); 31 Jul 2014 17:55:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jul 2014 17:55:39 +0000 Date: Thu, 31 Jul 2014 17:55:39 +0000 (UTC) From: "Russell Alexander Spitzer (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081184#comment-14081184 ] Russell Alexander Spitzer commented on CASSANDRA-7631: ------------------------------------------------------ Looks like that is as fast as I can go, CPU is pegged at max on my MBP. Let me clean up the code and I'll get a preview up. I'm relying on CQLSSTableWriter to buffer and do the writes which provides (3) for us but limits the program to 1 CQLSSTableWriter per process since it is not thread-safe. I think (4) and (5) could be very helpful though to giving that code an easier job. (2) There is much bad argument parsing I still need to add. I'm trying to track down one bug at the moment then i'll post a preliminary branch. > Allow Stress to write directly to SSTables > ------------------------------------------ > > Key: CASSANDRA-7631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7631 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Russell Alexander Spitzer > Assignee: Russell Alexander Spitzer > > One common difficulty with benchmarking machines is the amount of time it takes to initially load data. For machines with a large amount of ram this becomes especially onerous because a very large amount of data needs to be placed on the machine before page-cache can be circumvented. > To remedy this I suggest we add a top level flag to Cassandra-Stress which would cause the tool to write directly to sstables rather than actually performing CQL inserts. Internally this would use CQLSStable writer to write directly to sstables while skipping any keys which are not owned by the node stress is running on. The same stress command run on each node in the cluster would then write unique sstables only containing data which that node is responsible for. Following this no further network IO would be required to distribute data as it would all already be correctly in place. -- This message was sent by Atlassian JIRA (v6.2#6252)