cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Ilinykh <ailin...@gmail.com>
Subject Re: Why data tripled in size after repair?
Date Thu, 27 Sep 2012 17:33:15 GMT
On Thu, Sep 27, 2012 at 9:52 AM, Sylvain Lebresne <sylvain@datastax.com> wrote:
>> I don't understand why it copied data twice. In worst case scenario it
>> should copy everything (~90G)
>
> Sadly no, repair is currently peer-to-peer based (there is a ticket to
> fix it: https://issues.apache.org/jira/browse/CASSANDRA-3200, but
> that's not trivial). This mean that you can end up with RF times the
> data after a repair. Obviously that should be a worst case scenario as
> it implies everything is repaired, but at least the triplicate part is
> a problem, but a know and not so easy to fix one.

I see. It explains why I get 85G + 85G instead of 90G. But after next
repair I have six extra files 75G each,
how is it possible? It looks like repair is done per sstable, not CF.
Is it possible?

>
> Is it possible that each time you've ran repair, one of the node in
> the cluster was very out of sync with the other nodes. Maybe a node
> that has crashed for a long time?
>
No, nodes go down time to time (OOM), but I restart them
automatically. But my specific is - I have order preserved partitioner
and update intensively every 5th or 10th row.
As far as I understand, because of that when Merklee tree is
calculated, in every range I have several "hot" rows.  These rows are
good candidates to be inconsistant. There is one thing I don't
understand. Does Merklee tree calculation algorithm use sstables
flushed on hard drive or it uses mem tables also?
Let's say I have "hot" row which sits in memory in one node but
flushed out in another. Is the any difference in Merklee trees?

Thank you,
  Andrey

Mime
View raw message