cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Lowe <ryanjl...@gmail.com>
Subject Re: Scaling Out / Replication Factor too?
Date Sun, 28 Aug 2011 23:24:40 GMT
Edward,

This information (and the presentation) was very helpful... but I still have
a few more questions.

During a test run, I brought up 16 servers with RF of 2 and Read Repair
Chance of 1.0.  However, like I mentioned, the load was only on a few
servers.  I attempted to increase the Key and Row caching to completely
cover our entire dataset, but still CPU load on the few servers was still
extremely high.  Does the cache exist on every server in the cluster?  Would
turning off Read Repair (or turning it dramatically down) help reduce the
load on the servers with the heavy load?


Thanks!
Ryan


On Sun, Aug 28, 2011 at 3:49 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> So my question is this... if I bring in 20+ nodes, should I increase the
> replication factor as well?
>
> Each write is done to all natural endpoints of the data. If you set
> replication factor equal to number of nodes write operations do not scale.
> This is because each write has to happen on every node. The same thing is
> true with read operations, even if you ready at CL.ONE read repair will
> perform a read on all replicates.
>
> * However there is one caveat to this advice, I covered it in this
> presentation I did.
>
> http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache
>
> The read_repair_chance controls how often a read repair is triggered. You
> can increase Replication Factor and lower read_repair_chance. This gives you
> many severs capable of serving the same read without burdening by doing
> repair reads across the other 19 nodes.
>
> However this is NOT the standard method to scale out. The standard, and
> probably better way in all but a few instances, is to leave the replication
> factor alone and add more nodes.
>
> Normally, people set Replication Factor at 3. This gives 3 nodes to serve
> reads, as long as their dataset is small, which is true in your case, these
> reads are heavily cached. You would need a very high number of reads/writes
> to bottleneck any node.
>
> Raising and lowering replication factor is not the way to go, changing the
> replication factor involves more steps then just changing the variable as
> does growing and shrinking the cluster.
>
> What to do about idling servers is another question. We have thought about
> having our idling web servers join our hadoop cluster at night and then
> leave again in the morning :) Maybe you can have some fun with your
> cassandra gear in its idle time.
>
>
>
> On Sun, Aug 28, 2011 at 2:47 PM, Ryan Lowe <ryanjlowe@gmail.com> wrote:
>
>> We are working on a system that has super heavy traffic during specific
>> times... think of sporting events.  Other times we will get almost 0
>> traffic.  In order to handle the traffic during the events, we are planning
>> on scaling out cassandra into a very large cluster.   The size of our data
>> is still quite small.  A single event's data might be 100MB in size max, but
>> we will be inserting that data very rapidly and needing to read it at the
>> same time.
>>
>> Since we have very slow times, we use a replication factor of 2 and a
>> cluster size of 2 to handle the traffic... it handles it perfectly.
>>
>> Since dataset size is not really an issue, what is the best way to scale
>> out for us?  We are using order preserving partitioners to do range
>> scanning, so last time I tried to scale out our cluster we ended up with
>> very uneven load.  Then the few nodes that contained that data were very
>> swamped, while the rest were barely touched.
>>
>> Other note is that since we have very little data, and lots of memory, we
>> turned on key and row cache almost as high as we could go.
>>
>> So my question is this... if I bring in 20+ nodes, should I increase the
>> replication factor as well?  It would seem to make sense that more
>> replication factor would help distribute load?  Or does it just mean that
>> writes take even longer?  What are some other suggestions on how to do scale
>> up (and then back down) for a system that gets very high traffic in known
>> small time windows.
>>
>>
>>
>> Let me know if you need more info.
>>
>> Thanks!
>> Ryan
>>
>
>

Mime
View raw message