cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Svihla <rsvi...@datastax.com>
Subject Re: Changing replication factor of Cassandra cluster
Date Tue, 16 Dec 2014 17:47:14 GMT
Repair's performance is going to vary heavily by a large number of factors,
hours for 1 node to finish is within range of what I see in the wild, again
there are so many factors it's impossible to speculate on if that is good
or bad for your cluster. Factors that matter include:

   1. speed of disk io
   2. amount of ram and cpu on each node
   3. network interface speed
   4. is this multidc or not
   5. are vnodes enabled or not
   6. what are the jvm tunings
   7. compaction settings
   8. current load on the cluster
   9. streaming settings

Suffice it to say to improve repair performance is a full on tuning
exercise, note you're current operation is going to be worse than
tradtional repair, as your streaming copies of data around and not just
doing normal merkel tree work.

Restoring from backup to a new cluster (including how to handle token
ranges) is discussed in detail here
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html


On Mon, Dec 15, 2014 at 4:14 PM, Pranay Agarwal <agarwalpranaya@gmail.com>
wrote:
>
> Hi All,
>
>
> I have 20 nodes cassandra cluster with 500gb of data and replication
> factor of 1. I increased the replication factor to 3 and ran nodetool
> repair on each node one by one as the docs says. But it takes hours for 1
> node to finish repair. Is that normal or am I doing something wrong?
>
> Also, I took backup of cassandra data on each node. How do I restore the
> graph in a new cluster of nodes using the backup? Do I have to have the
> tokens range backed up as well?
>
> -Pranay
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Mime
View raw message