cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Alexander Spitzer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables
Date Thu, 31 Jul 2014 03:18:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080429#comment-14080429
] 

Russell Alexander Spitzer commented on CASSANDRA-7631:
------------------------------------------------------

Back on topic, I've been running through a series of experiments to see how much faster (if
any) running through cqlsstablewriter would be than just using the native client.

Here are some quick numbers run on my macbook against C* also running on my macbook (for native
protocol)
{code}
     NOOP = Just generate a row don't do anything with it (I know this may be optimized out)
     Native = Run using -mode native cql3
     SSTable = Run passing rows to a queue which is consumed by a single thread running CQLSSTableWriter

     n=1M using the example user profile
user n=1000000 no_warmup profile=cqlstress-example.yaml ops(insert=1) -rate threads=N -mode
(sstable|native cql3)

            Partitions Per Second
Threads     NOOP   Native   SSTable
1	22765	10165	20917
2	38333	17247	38659
4	58089	26920	33956
8	72434	33507	29354
16	87837	34195	29354                     
{code}

So while a single SSTable writer can keep up with the generator threads it looks like contention
over the ArrayBlockingQueue puts a threshold on performance. I'm going to look into getting
a threading safe version of the SSTableWriter tomorrow (there is at the very least contention
on file naming), hopefully we'll be able to just tie a different SSTableWriter to each generator.

If all else fails we can just have them writing to different directories then rename the sstables
when we have finished. 

> Allow Stress to write directly to SSTables
> ------------------------------------------
>
>                 Key: CASSANDRA-7631
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Russell Alexander Spitzer
>            Assignee: Russell Alexander Spitzer
>
> One common difficulty with benchmarking machines is the amount of time it takes to initially
load data. For machines with a large amount of ram this becomes especially onerous because
a very large amount of data needs to be placed on the machine before page-cache can be circumvented.

> To remedy this I suggest we add a top level flag to Cassandra-Stress which would cause
the tool to write directly to sstables rather than actually performing CQL inserts. Internally
this would use CQLSStable writer to write directly to sstables while skipping any keys which
are not owned by the node stress is running on. The same stress command run on each node in
the cluster would then write unique sstables only containing data which that node is responsible
for. Following this no further network IO would be required to distribute data as it would
all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message