cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables
Date Mon, 28 Jul 2014 22:41:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077050#comment-14077050
] 

Brandon Williams commented on CASSANDRA-7631:
---------------------------------------------

bq. Stress seems like a perfectly reasonable place to put this, really. It also means we know
the data generated is compatible with the stress workload, which is important.

I agree with your latter point, but we could still reuse the code in a separate utility. 
It just seems like stress has enough options as it is, and introducing an sstable writer would
make a lot of them nonsensical (like consistency level, replication, etc.)  I'd somewhat prefer
having a clear delineation, util-wise, between going over the network and writing to disk.

> Allow Stress to write directly to SSTables
> ------------------------------------------
>
>                 Key: CASSANDRA-7631
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Russell Alexander Spitzer
>            Assignee: Russell Alexander Spitzer
>
> One common difficulty with benchmarking machines is the amount of time it takes to initially
load data. For machines with a large amount of ram this becomes especially onerous because
a very large amount of data needs to be placed on the machine before page-cache can be circumvented.

> To remedy this I suggest we add a top level flag to Cassandra-Stress which would cause
the tool to write directly to sstables rather than actually performing CQL inserts. Internally
this would use CQLSStable writer to write directly to sstables while skipping any keys which
are not owned by the node stress is running on. The same stress command run on each node in
the cluster would then write unique sstables only containing data which that node is responsible
for. Following this no further network IO would be required to distribute data as it would
all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message