incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Read before Write
Date Fri, 27 Aug 2010 19:23:18 GMT
On Fri, Aug 27, 2010 at 1:26 PM, Ran Tavory <rantav@gmail.com> wrote:
> I haven't benchmarked so it's purely theoretical.
> If there's no caching then I'm pretty sure just writing would yield better
> performance.
> If you do cache rows/keys it really depends on your hit ratio. Naturally if
> you have a small data set and high cache ratio and use row caching I'm
> pretty sure it's better to read first.
> Although writes are order of magnitude faster than reads, if you have high
> write rate then cassandra might throttle you at different bottlenecks,
> depending on your hardware and data so for example disk is many times a
> bottleneck (and you can teak storage-conf to improve that), sometimes memory
> is pressing and I have seen also CPU pressure although it's less common.
> You need to also keep in mind that even if you write the same value but with
> a newer timestamp then cassandra will have to run compactions and that's
> where disk/mem is usually bottlenecking.
> Bottom line - if you can cache (have enough mem) and there's good hit ratio,
> cache entire rows and read first. If not, always write first and make sure
> compactions aren't killing you, if they are, tweak storage-conf to do less
> compactions.
>
> On Fri, Aug 27, 2010 at 5:44 PM, Chen Xinli <chen.daqi@gmail.com> wrote:
>>
>> I think Just writing all the time is much better, as most of replacements
>> will be done in memtable.
>>
>> also you should set a large memtable size, in compared with the average
>> row size.
>>
>>
>> 2010/8/27 Daniel Doubleday <daniel.doubleday@gmx.net>
>>>
>>> Hi people
>>>
>>> I was wondering if anyone already benchmarked such a situation:
>>>
>>> I have:
>>>
>>> day of year (row key) -> SomeId (column key) -> byte[0]
>>>
>>> I need to make sure that I write SomeId, but in around 80% of the cases
>>> it will be already present (so I would essentially replace it with itself).
>>> RF will be 2.
>>>
>>> So should I rather just write all the time (given that cassandra is so
>>> fast on write) or should I read and write only if not present?
>>>
>>> Cheers,
>>> Daniel
>>
>>
>> --
>> Best Regards,
>> Chen Xinli
>
>

Read before write is usually a bad idea in cassandra.

We have a multiple node cluster with ~ 100 GB per node. We have a
fairly substantial 800,000 item row cache, which sees about a 70% hit
rate. Our application measures writes at QUORUM 1 ms, and reads at ONE
7-10, reads seem to be about 3-6 ms when the data was around 70GB per
node.

Given that a write takes 1 ms and a read takes 7 ms, and that reads
are more intensive I would almost never advocate reading before
writing.

Edward

Mime
View raw message