incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Shook <jsh...@gmail.com>
Subject Re: Read Latency
Date Tue, 11 May 2010 15:08:00 GMT
RAID may be less valuable to you here. More useful to you would be to
split the storage according to
http://wiki.apache.org/cassandra/CassandraHardware

When Cassandra is accessing effectively random parts of a large data
store, expect it to be constantly hitting certain "always hot" parts
of files, and doing random reads on others. The "hot" data is
generally cached by your OS automatically.

When Cassandra is handling many insertions (changes) or deletions,
expect it to do bulk file streaming.

These two types of activity are easy to split apart, which can have a
tremendous benefit of dividing access patterns between streaming and
random access. From the literature so far, this will usually be more
effective than trying to increase aggregate disk performance with both
types of data on the same physical storage.

On Tue, May 11, 2010 at 9:57 AM, Wayne <wav100@gmail.com> wrote:
> I am evaluating Cassandra, and Read latency is the biggest concern in terms
> of performance. As I test various scenarios and configurations I am getting
> surprising results. I have a 2 node cluster with both nodes connected to
> direct attached storage. The read latency pulling data off the raid 10
> storage is worse than off of the internal drive. The drives are of the same
> sata 7200 rpm speed, and this does not make sense. This is for single,
> isolated requests, obviously in scale the RAID should perform better... I
> have not started testing concurrent reads in scale as the single reads are
> too slow to begin with. I am getting 20-30ms response time off of internal
> drives and 50-70 ms response time through the raid volumes (as reported in
> cfstats). The system is totally idle and all data has been cleanly
> compacted. These both seem very high numbers. All cache as been turned off
> for testing as we expect our cache hit ratio to not be that good. More
> spindles usually speeds things up, but I am seeing the opposite. I am using
> default settings for configuration. My write latency is very good and in
> line with what I see in terms of posted benchmarks.
>
> What are the recommended solutions to reduce read latency in terms of CF
> definition, cassandra configuration, hardware, etc?
> Do more keyspaces & column families increase latency (I originally saw 3-5
> ms read latency with a small amount of data and 1 Keyspace/CF)?
> Shouldn't RAID 10 help overall latency and throughput (more, faster disks
> are better)?
> What is a "normal" expected read latency with no cache?
> I am using super columns, would read latency and overall performance be
> faster to use a compound column instead?
> I have many different CF to isolate different data (some with the same key),
> would I be better served to combine CFs and thereby reduce the number of CFs
> and possibly increase key cache hits (at the cost of bigger rows)? I am
> testing with 10 Keyspaces and 6 CFs each.
>
> Any recommendations would be appreciated.
>
> Thanks.
>

Mime
View raw message