incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: what's the difference between repair CF separately and repair the entire node?
Date Wed, 14 Sep 2011 08:53:32 GMT
thanks a lot for the help!

 I have read the post and think 0.8 might be good enough for me, especially
0.8.5.

also change gc_grace_seconds is a acceptable solution.



On Wed, Sep 14, 2011 at 4:03 PM, Sylvain Lebresne <sylvain@datastax.com>wrote:

> On Wed, Sep 14, 2011 at 9:27 AM, Yan Chunlu <springrider@gmail.com> wrote:
> > is 0.8 ready for production use?
>
> some related discussion here:
> http://www.mail-archive.com/user@cassandra.apache.org/msg17055.html
> but my personal answer is yes.
>
> >  as I know currently many companies including reddit.com are using 0.7,
> how
> > does they get rid of the repair problem?
>
> Repair problems in 0.7 don't hit everyone equally. For some people, it
> works
> relatively well even if not in the most efficient ways. Also, for some
> workload
> (if you don't do  much deletes for instance), you can set a big
> gc_grace_seconds
> value (say a month) and only run repair that often, which can make repair
> inefficiencies more bearable.
> That being said, I can't speak for "many companies", but I do advise
> evaluating
> an upgrade to 0.8.
>
> --
> Sylvain
>
> >
> > On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne <sylvain@datastax.com>
> > wrote:
> >>
> >> On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu <springrider@gmail.com>
> wrote:
> >> > me neither don't want to repair one CF at the time.
> >> > the "node repair" took a week and still running, compactionstats and
> >> > netstream shows nothing is running on every node,  and also no error
> >> > message, no exception, really no idea what was it doing,
> >>
> >> To add to the list of things repair does wrong in 0.7, we'll have to add
> >> that
> >> if one of the node participating in the repair (so any node that share a
> >> range
> >> with the node on which repair was started) goes down (even for a short
> >> time),
> >> then the repair will simply hang forever doing nothing. And no specific
> >> error message will be logged. That could be what happened. Again, recent
> >> releases of 0.8 fix that too.
> >>
> >> --
> >> Sylvain
> >>
> >> > I stopped yesterday.  maybe I should run repair again while disable
> >> > compaction on all nodes?
> >> > thanks!
> >> >
> >> > On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
> >> > <peter.schuller@infidyne.com> wrote:
> >> >>
> >> >> > I think it is a serious problem since I can not "repair".....
 I am
> >> >> > using cassandra on production servers. is there some way to fix
it
> >> >> > without upgrade?  I heard of that 0.8.x is still not quite ready
in
> >> >> > production environment.
> >> >>
> >> >> It is a serious issue if you really need to repair one CF at the
> time.
> >> >> However, looking at your original post it seems this is not
> >> >> necessarily your issue. Do you need to, or was your concern rather
> the
> >> >> overall time repair took?
> >> >>
> >> >> There are other things that are improved in 0.8 that affect 0.7. In
> >> >> particular, (1) in 0.7 compaction, including validating compactions
> >> >> that are part of repair, is non-concurrent so if your repair starts
> >> >> while there is a long-running compaction going it will have to wait,
> >> >> and (2) semi-related is that the merkle tree calculation that is part
> >> >> of repair/anti-entropy may happen "out of synch" if one of the nodes
> >> >> participating happen to be busy with compaction. This in turns causes
> >> >> additional data to be sent as part of repair.
> >> >>
> >> >> That might be why your immediately following repair took a long time,
> >> >> but it's difficult to tell.
> >> >>
> >> >> If you're having issues with repair and large data sets, I would
> >> >> generally say that upgrading to 0.8 is recommended. However, if
> you're
> >> >> on 0.7.4, beware of
> >> >> https://issues.apache.org/jira/browse/CASSANDRA-3166
> >> >>
> >> >> --
> >> >> / Peter Schuller (@scode on twitter)
> >> >
> >> >
> >
> >
>

Mime
View raw message