incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Masson <james.mas...@opigram.com>
Subject Re: Cassandra read throughput with little/no caching.
Date Fri, 21 Dec 2012 17:03:48 GMT


On 21/12/12 16:27, Yiming Sun wrote:
> James, using RandomPartitioner, the order of the rows is random, so when
> you request these rows in "Sequential" order (sort by the date?),
> Cassandra is not reading them sequentially.

Yes, I understand the "next" row to be retrieved in sequence is likely 
to be on a different node, and the ordering is random. I'm using the 
word sequential to try to explain that the data being requested is in an 
order, and not repeated, until the next cycle. The data is not 
guaranteed to be of a size that is cache-able as a whole.

>
> The size of the data, 200Mb, 300Mb , and 40Mb, are these the size for
> each column? Or are these the total size of the entire column family?
>   It wasn't too clear to me.  But if these are the total size of the
> column families, you will be able to fit them mostly in memory, so you
> should enable row cache.

Size of the column family, on a single node. Row caching is off at the 
moment.

Are you saying that I should increase the JVM heap to fit some data in 
the row cache, at the expense of linux disk caching?

Bear in mind that the data is only going to be re-requested in sequence 
again - I'm not sure what the value is in the cassandra native caching 
if rows are not re-requested before being evicted.

My current key-cache hit-rates are near zero on this workload, hence I'm 
interested in cassandra's zero-cache performance. Unless I can guarantee 
to fit the entire data-set in memory, it's difficult to justify using 
memory on a cassandra cache if LRU and workload means it's not actually 
a benefit.

>
> I happen to have done some performance tests of my own on cassandra,
> mostly on the read, and was also only able to get less than 6MB/sec read
> rate out of a cluster of 6 nodes RF2 using a single threaded client.
>   But it makes a huge difference when I changed the client to an
> asynchronous multi-threaded structure.
>

Yes, I've been talking to the developers about having a separate thread 
or two that keeps cassandra busy, keeping Disruptor 
(http://lmax-exchange.github.com/disruptor/) fed to do the processing work.

But this all doesn't change the fact that under this zero-cache 
workload, cassandra seems to be very CPU expensive for throughput.

thanks

James M

>
>
>
> On Fri, Dec 21, 2012 at 10:36 AM, James Masson <james.masson@opigram.com
> <mailto:james.masson@opigram.com>> wrote:
>
>
>     Hi,
>
>     thanks for the reply
>
>
>     On 21/12/12 14:36, Yiming Sun wrote:
>
>         I have a few questions for you, James,
>
>         1. how many nodes are in your Cassandra ring?
>
>
>     2 or 3 - depending on environment - it doesn't seem to make a
>     difference to throughput very much. What is a 30 minute task on a 2
>     node environment is a 30 minute task on a 3 node environment.
>
>
>         2. what is the replication factor?
>
>
>     1
>
>         3. when you say sequentially, what do you mean?  what
>         Partitioner do you
>         use?
>
>
>     The data is organised by date - the keys are read sequentially in
>     order, only once.
>
>     Random partitioner - the data is equally spread across the nodes to
>     avoid hotspots.
>
>
>         4. how many columns per row?  how much data per row?  per column?
>
>
>     varies - described in the schema.
>
>     create keyspace mykeyspace
>        with placement_strategy = 'SimpleStrategy'
>        and strategy_options = {replication_factor : 1}
>        and durable_writes = true;
>
>
>     create column family entities
>        with column_type = 'Standard'
>        and comparator = 'BytesType'
>        and default_validation_class = 'BytesType'
>        and key_validation_class = 'AsciiType'
>        and read_repair_chance = 0.0
>        and dclocal_read_repair_chance = 0.0
>        and gc_grace = 0
>        and min_compaction_threshold = 4
>        and max_compaction_threshold = 32
>        and replicate_on_write = false
>        and compaction_strategy =
>     'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
>        and caching = 'NONE'
>        and column_metadata = [
>          {column_name : '64656c65746564',
>          validation_class : BytesType,
>          index_name : 'deleted_idx',
>          index_type : 0},
>          {column_name : '6576656e744964',
>          validation_class : TimeUUIDType,
>          index_name : 'eventId_idx',
>          index_type : 0},
>          {column_name : '7061796c6f6164',
>          validation_class : UTF8Type}];
>
>     2 columns per row here - about 200Mb of data in total
>
>
>     create column family events
>        with column_type = 'Standard'
>        and comparator = 'BytesType'
>        and default_validation_class = 'BytesType'
>        and key_validation_class = 'TimeUUIDType'
>        and read_repair_chance = 0.0
>        and dclocal_read_repair_chance = 0.0
>        and gc_grace = 0
>        and min_compaction_threshold = 4
>        and max_compaction_threshold = 32
>        and replicate_on_write = false
>        and compaction_strategy =
>     'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
>        and caching = 'NONE';
>
>     1 column per row - about 300Mb of data
>
>     create column family intervals
>        with column_type = 'Standard'
>        and comparator = 'BytesType'
>        and default_validation_class = 'BytesType'
>        and key_validation_class = 'AsciiType'
>        and read_repair_chance = 0.0
>        and dclocal_read_repair_chance = 0.0
>        and gc_grace = 0
>        and min_compaction_threshold = 4
>        and max_compaction_threshold = 32
>        and replicate_on_write = false
>        and compaction_strategy =
>     'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
>        and caching = 'NONE';
>
>     variable columns per row - about 40Mb of data.
>
>
>
>         5. what client library do you use to access Cassandra?
>           (Hector?).  Is
>         your client code single threaded?
>
>
>     Hector - yes, the processing side of the client is single threaded,
>     but is largely waiting for cassandra responses and has plenty of CPU
>     headroom.
>
>
>     I guess what I'm most interested in is why the discrepancy in
>     between read/write latency - although I understand the data volume
>     is much larger in reads, even though the request rate is lower.
>
>     Network usage on a cassandra box barely gets above 20Mbit, including
>     inter-cluster comms. Averages 5mbit client<>cassandra
>
>     There is near zero disk I/O, and what little there is is served sub
>     1ms. Storage is backed by a very fast SAN, but like I said earlier,
>     the dataset just about fits in the Linux disk cache. 2Gb VM, 512Mb
>     cassandra heap - GCs are nice and quick, no JVM memory problems,
>     used heap oscillates between 280-350Mb.
>
>     Basically, I'm just puzzled as cassandra doesn't behave as I would
>     expect. Huge CPU use in cassandra for very little throughput. I'm
>     struggling to find anything that's wrong with the environment,
>     there's no bottleneck that I can see.
>
>     thanks
>
>     James M
>
>
>
>
>
>         On Fri, Dec 21, 2012 at 7:27 AM, James Masson
>         <james.masson@opigram.com <mailto:james.masson@opigram.com>
>         <mailto:james.masson@opigram.__com
>         <mailto:james.masson@opigram.com>>> wrote:
>
>
>              Hi list-users,
>
>              We have an application that has a relatively unusual access
>         pattern
>              in cassandra 1.1.6
>
>              Essentially we read an entire multi hundred megabyte column
>         family
>              sequentially (little chance of a cassandra cache hit),
>         perform some
>              operations on the data, and write the data back to another
>         column
>              family in the same keyspace.
>
>              We do about 250 writes/sec and 100 reads/sec during this
>         process.
>              Write request latency is about 900 microsecs, read request
>         latency
>              is about 4000 microsecs.
>
>              * First Question: Do these numbers make sense?
>
>              read-request latency seems a little high to me, cassandra
>         hasn't had
>              a chance to cache this data, but it's likely in the Linux disk
>              cache, given the sizing of the node/data/jvm.
>
>              thanks
>
>              James M
>
>
>

Mime
View raw message