incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Xu <wenhao...@gmail.com>
Subject Re: How to configure Cassandra params to handle heavy write, light read, short TTL scenario
Date Fri, 19 Apr 2013 01:55:07 GMT
Hi Aaron,

1. A timeout more than 10ms to us is the max value we could accept
2. It is a random key access, not a range scan.
3. We have only one column family only for that keyspace, we select the
columns.

Thanks.

Best wishes,
Stanley Xu


On Fri, Apr 19, 2013 at 2:22 AM, aaron morton <aaron@thelastpickle.com>wrote:

> > Is that possible that we could make some configuration, so there will be
> like a mem_table queue in the memory, like there are 4 mem_tables in the
> memory, from mem1, mem2, mem3, mem4 based on time series, and the Cassandra
> will flush mem1, and once there is a mem5 is full, it will flush the mem2.
> Is that possible?
> No.
>
> > We were using Cassandra for this with 40 QPS of read before, but once
> the QPS to read increase, it looks the IO_WAIT of the system increase
> heavily and we got a lot of timeout in query(we set 10ms as the timeout).
> Look at the cfhistogram for the CF. Look at the read latency column, the
> number on the left is microseconds and the number in the read latency
> column is how many local reads took that long. Also look at the SSTables
> column, this is the number of SSTables that were involved in the read.
>
> Consider increasing the rpc_timeout to reduce the timeout errors until you
> reduce the read latency.
>
> Is the read a range scan or selecting by row key?
> When you do the read, is a to select all columns in the row or do you
> select columns by name? The later is more performant.
>
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/04/2013, at 12:22 AM, Stanley Xu <wenhao.xu@gmail.com> wrote:
>
> > Dear buddies,
> >
> > We are using Cassandra to handle a tech scenario like the following:
> >
> > 1. A table using a Long as Key, and has one and only one Integer as a
> ColumnFamily, with 2 hours as the TTL.
> > 2. The wps(write per second) is 45000, the qps(read per second) would be
> about 30 - 200.
> > 3. There isn't a "hot zone" for read(which means each key query would be
> a different key), but most of the reads will hit the writes in the last 30
> minutes
> > 4. All writes are new key with new value, no overwrite.
> >
> >
> > We were using Cassandra for this with 40 QPS of read before, but once
> the QPS to read increase, it looks the IO_WAIT of the system increase
> heavily and we got a lot of timeout in query(we set 10ms as the timeout).
> >
> > Per my understand, the main reason is that most of the queries will hit
> the disk with our configuration.
> >
> > I am wondering if following things will help us to handle the load.
> >
> > 1. Increase the size of mem_table, so most of the read will read from
> mem_table, and since the mem_table hasn't been flushed to disk yet, a query
> to the sstable will be filtered by bloomfilter, so no disk seek will happen.
> >
> > But our major concern is that once a large mem_table is flushed to the
> disk, then the new incoming queries will all went to disk and the timeout
> crash will still happen.
> >
> > Is that possible that we could make some configuration, so there will be
> like a mem_table queue in the memory, like there are 4 mem_tables in the
> memory, from mem1, mem2, mem3, mem4 based on time series, and the Cassandra
> will flush mem1, and once there is a mem5 is full, it will flush the mem2.
> Is that possible?
> >
> >
> > Best wishes,
> > Stanley Xu
> > Best wishes,
> > Stanley Xu
>
>

Mime
View raw message