incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: what's the difference between repair CF separately and repair the entire node?
Date Wed, 14 Sep 2011 08:03:24 GMT
On Wed, Sep 14, 2011 at 9:27 AM, Yan Chunlu <springrider@gmail.com> wrote:
> is 0.8 ready for production use?

some related discussion here:
http://www.mail-archive.com/user@cassandra.apache.org/msg17055.html
but my personal answer is yes.

>  as I know currently many companies including reddit.com are using 0.7, how
> does they get rid of the repair problem?

Repair problems in 0.7 don't hit everyone equally. For some people, it works
relatively well even if not in the most efficient ways. Also, for some workload
(if you don't do  much deletes for instance), you can set a big gc_grace_seconds
value (say a month) and only run repair that often, which can make repair
inefficiencies more bearable.
That being said, I can't speak for "many companies", but I do advise evaluating
an upgrade to 0.8.

--
Sylvain

>
> On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne <sylvain@datastax.com>
> wrote:
>>
>> On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu <springrider@gmail.com> wrote:
>> > me neither don't want to repair one CF at the time.
>> > the "node repair" took a week and still running, compactionstats and
>> > netstream shows nothing is running on every node,  and also no error
>> > message, no exception, really no idea what was it doing,
>>
>> To add to the list of things repair does wrong in 0.7, we'll have to add
>> that
>> if one of the node participating in the repair (so any node that share a
>> range
>> with the node on which repair was started) goes down (even for a short
>> time),
>> then the repair will simply hang forever doing nothing. And no specific
>> error message will be logged. That could be what happened. Again, recent
>> releases of 0.8 fix that too.
>>
>> --
>> Sylvain
>>
>> > I stopped yesterday.  maybe I should run repair again while disable
>> > compaction on all nodes?
>> > thanks!
>> >
>> > On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
>> > <peter.schuller@infidyne.com> wrote:
>> >>
>> >> > I think it is a serious problem since I can not "repair".....  I am
>> >> > using cassandra on production servers. is there some way to fix it
>> >> > without upgrade?  I heard of that 0.8.x is still not quite ready in
>> >> > production environment.
>> >>
>> >> It is a serious issue if you really need to repair one CF at the time.
>> >> However, looking at your original post it seems this is not
>> >> necessarily your issue. Do you need to, or was your concern rather the
>> >> overall time repair took?
>> >>
>> >> There are other things that are improved in 0.8 that affect 0.7. In
>> >> particular, (1) in 0.7 compaction, including validating compactions
>> >> that are part of repair, is non-concurrent so if your repair starts
>> >> while there is a long-running compaction going it will have to wait,
>> >> and (2) semi-related is that the merkle tree calculation that is part
>> >> of repair/anti-entropy may happen "out of synch" if one of the nodes
>> >> participating happen to be busy with compaction. This in turns causes
>> >> additional data to be sent as part of repair.
>> >>
>> >> That might be why your immediately following repair took a long time,
>> >> but it's difficult to tell.
>> >>
>> >> If you're having issues with repair and large data sets, I would
>> >> generally say that upgrading to 0.8 is recommended. However, if you're
>> >> on 0.7.4, beware of
>> >> https://issues.apache.org/jira/browse/CASSANDRA-3166
>> >>
>> >> --
>> >> / Peter Schuller (@scode on twitter)
>> >
>> >
>
>

Mime
View raw message