incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: counters + replication = awful performance?
Date Wed, 28 Nov 2012 00:14:50 GMT
Cassandra's counters read on increment. Additionally they are distributed
so that can be multiple reads on increment. If they are not fast enough and
you have avoided all tuning options add more servers to handle the load.

In many cases incrementing the same counter n times can be avoided.

Twitter's rainbird did just that. It avoided multiple counter increments by
batching them.

I have done a similar think using cassandra and Kafka.

https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java


On Tuesday, November 27, 2012, Sergey Olefir <solf.lists@gmail.com> wrote:
> Hi, thanks for your suggestions.
>
> Regarding replicate=2 vs replicate=1 performance: I expected that below
> configurations will have similar performance:
> - single node, replicate = 1
> - two nodes, replicate = 2 (okay, this probably should be a bit slower due
> to additional overhead).
>
> However what I'm seeing is that second option (replicate=2) is about THREE
> times slower than single node.
>
>
> Regarding replicate_on_write -- it is, in fact, a dangerous option. As
JIRA
> discusses, if you make changes to your ring (moving tokens and such) you
> will *silently* lose data. That is on top of whatever data you might end
up
> losing if you run replicate_on_write=false and the only node that got the
> data fails.
>
> But what is much worse -- with replicate_on_write being false the data
will
> NOT be replicated (in my tests) ever unless you explicitly request the
cell.
> Then it will return the wrong result. And only on subsequent reads it will
> return adequate results. I haven't tested it, but documentation states
that
> range query will NOT do 'read repair' and thus will not force replication.
> The test I did went like this:
> - replicate_on_write = false
> - write something to node A (which should in theory replicate to node B)
> - wait for a long time (longest was on the order of 5 hours)
> - read from node B (and here I was getting null / wrong result)
> - read from node B again (here you get what you'd expect after read
repair)
>
> In essence, using replicate_on_write=false with rarely read data will
> practically defeat the purpose of having replication in the first place
> (failover, data redundancy).
>
>
> Or, in other words, this option doesn't look to be applicable to my
> situation.
>
> It looks like I will get much better performance by simply writing to two
> separate clusters rather than using single cluster with replicate=2. Which
> is kind of stupid :) I think something's fishy with counters and
> replication.
>
>
>
> Edward Capriolo wrote
>> I mispoke really. It is not dangerous you just have to understand what it
>> means. this jira discusses it.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-3868
>>
>> On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay &lt;
>
>> scottm@
>
>> &gt;wrote:
>>
>>>  We're having a similar performance problem.  Setting
>>> 'replicate_on_write:
>>> false' fixes the performance issue in our tests.
>>>
>>> How dangerous is it?  What exactly could go wrong?
>>>
>>> On 12-11-27 01:44 PM, Edward Capriolo wrote:
>>>
>>> The difference between Replication factor =1 and replication factor > 1
>>> is
>>> significant. Also it sounds like your cluster is 2 node so going from
>>> RF=1
>>> to RF=2 means double the load on both nodes.
>>>
>>>  You may want to experiment with the very dangerous column family
>>> attribute:
>>>
>>>  - replicate_on_write: Replicate every counter update from the leader to
>>> the
>>> follower replicas. Accepts the values true and false.
>>>
>>>  Edward
>>>  On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman <
>>>
>
>> mkjellman@
>
>>> wrote:
>>>
>>>> Are you writing with QUORUM consistency or ONE?
>>>>
>>>> On 11/27/12 9:52 AM, "Sergey Olefir" &lt;
>
>> solf.lists@
>
>> &gt; wrote:
>>>>
>>>> >Hi Juan,
>>>> >
>>>> >thanks for your input!
>>>> >
>>>> >In my case, however, I doubt this is the case -- clients are able to
>>>> push
>>>> >many more updates than I need to saturate replication_factor=2 case
>>>> (e.g.
>>>> >I'm doing as many as 6x more increments when testing 2-node cluster
>>>> with
>>>> >replication_factor=1), so bandwidth between clients and server should
>>>> be
>>>> >sufficient.
>>>> >
>>>> >Bandwidth between nodes in the cluster should also be quite sufficient
>>>> >since
>>>> >they are both in the same DC. But it is something to check, thanks!
>>>> >
>>>> >Best regards,
>>>> >Sergey
>>>> >
>>>> >
>>>> >Juan Valencia wrote
>>>> >> Hi Sergey,
>>>> >>
>>>> >> I know I've had similar issues with counters which were
bottle-necked
>>>> by
>>>> >> network throughput.  You might be seeing a problem with throughput
>>>> >>between
>>>> >> the clients and Cass or between the two Cass nodes.  It might not
be
>>>> >>your
>>>> >> case, but that was what happened to me :-)
>>>> >>
>>>> >> Juan
>>>> >>
>>>> >>
>>>> >> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir &lt;
>>>> >
>>>> >> solf.lists@
>>>> >
>>>> >> &gt; wrote:
>>>> >>
>>>> >>> Hi,
>>>> >>>
>>>> >>> I have a serious problem with counters performance and I can't
seem
>>>> to
>>>> >>> figure it out.
>>>> >>>
>>>> >>> Basically I'm building a system for accumulating some statistics
"on
>>>> >>>the
>>>> >>> fly" via Cassandra distributed counters. For this I need counter
>>>> >>>updates
>>>> >>> to
>>>> >>> work "really fast" and herein lies my problem -- as soon as
I
enable
>>>> >>> replication_factor = 2, the performance goes down the drain.
This
>>>> >>>happens
>>>> >>> in
>>>> >>> my tests using both 1.0.x and 1.1.6.
>>>> >>>
>>>> >>> Let me elaborate:
>>>> >>>
>>>> >>> I have two boxes (virtual servers on top of physical servers
rented
>>>> >>> specifically for this purpose, i.e. it's not a cloud, nor it
is
>>>> shared;
>>>> >>> virtual servers are managed by our admins as a way to limit
damage
>>>> as
>>>> I
>>>> >>> suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner
>>>> >>> because
>>>> >>> I want to be able to do some range queries.
>>>> >>>
>>>> >>> First, I set up Cassandra individually on each box (not in a
>>>> cluster)
>>>> >>>and
>>>> >>> test counter increments performance (exclusively increments,
no
>>>> reads).
>>>> >>> For
>>>> >>> tests I use code that is intended to somewhat resemble the expected
>>>> >>>load
>>>> >>> pattern -- particularly the majority of increments create new
>>>> counters
>>>> >>> with
>>>> >>> some updating (adding) to already existing counters. In this
test
>>>> each
>>>> >>> single node exhibits respectable performance - something on
the
>>>> order
>>>> >>>of
>>>> >>> 70k
>>>> >>> (seventy thousand) increments per second.
>>>> >>>
>>>> >>> I then join both of these nodes into single cluster (using
>>>> SimpleSnitch
>>>> >>> and
>>>> >>> SimpleStrategy, nothing fancy yet). I then run the same test
using
>>>> >>> replication_factor=1. The performance is on the order of 120k
>>>> >>>increments
>>>> >>> per
>>>>>>> 'Like' us on Facebook for exclusive content and other resources
on
all
>>>> Barracuda Networks solutions.
>>>>
>>>> Visit http://barracudanetworks.com/facebook
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> *Scott McKay*, Sr. Software Developer
>>> MailChannels
>>>
>>> Tel: +1 604 685 7488 x 509
>>> www.mailchannels.com
>>>
>
>
>
>
>
> --
> View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584011.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
Nabble.com.
>

Mime
View raw message