incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: what's the difference between repair CF separately and repair the entire node?
Date Wed, 14 Sep 2011 07:27:48 GMT
is 0.8 ready for production use?   as I know currently many companies
including reddit.com are using 0.7, how does they get rid of the repair
problem?

On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne <sylvain@datastax.com>wrote:

> On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu <springrider@gmail.com> wrote:
> > me neither don't want to repair one CF at the time.
> > the "node repair" took a week and still running, compactionstats and
> > netstream shows nothing is running on every node,  and also no error
> > message, no exception, really no idea what was it doing,
>
> To add to the list of things repair does wrong in 0.7, we'll have to add
> that
> if one of the node participating in the repair (so any node that share a
> range
> with the node on which repair was started) goes down (even for a short
> time),
> then the repair will simply hang forever doing nothing. And no specific
> error message will be logged. That could be what happened. Again, recent
> releases of 0.8 fix that too.
>
> --
> Sylvain
>
> > I stopped yesterday.  maybe I should run repair again while disable
> > compaction on all nodes?
> > thanks!
> >
> > On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
> > <peter.schuller@infidyne.com> wrote:
> >>
> >> > I think it is a serious problem since I can not "repair".....  I am
> >> > using cassandra on production servers. is there some way to fix it
> >> > without upgrade?  I heard of that 0.8.x is still not quite ready in
> >> > production environment.
> >>
> >> It is a serious issue if you really need to repair one CF at the time.
> >> However, looking at your original post it seems this is not
> >> necessarily your issue. Do you need to, or was your concern rather the
> >> overall time repair took?
> >>
> >> There are other things that are improved in 0.8 that affect 0.7. In
> >> particular, (1) in 0.7 compaction, including validating compactions
> >> that are part of repair, is non-concurrent so if your repair starts
> >> while there is a long-running compaction going it will have to wait,
> >> and (2) semi-related is that the merkle tree calculation that is part
> >> of repair/anti-entropy may happen "out of synch" if one of the nodes
> >> participating happen to be busy with compaction. This in turns causes
> >> additional data to be sent as part of repair.
> >>
> >> That might be why your immediately following repair took a long time,
> >> but it's difficult to tell.
> >>
> >> If you're having issues with repair and large data sets, I would
> >> generally say that upgrading to 0.8 is recommended. However, if you're
> >> on 0.7.4, beware of
> >> https://issues.apache.org/jira/browse/CASSANDRA-3166
> >>
> >> --
> >> / Peter Schuller (@scode on twitter)
> >
> >
>

Mime
View raw message