cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: repair question
Date Mon, 23 May 2011 17:48:26 GMT
On Mon, May 23, 2011 at 7:17 PM, Daniel Doubleday
<daniel.doubleday@gmx.net> wrote:
> Hi all
>
> I'm a bit lost: I tried a repair yesterday with only one CF and that didn't really work
the way I expected but I thought that would be a bug which only affects that special case.
>
> So I tried again for all CFs.
>
> I started with a nicely compacted machine with around 320GB of load. Total disc space
on this node was 1.1TB.
>
> After it went out of disc space (meaning I received around 700GB of data) I had a very
brief look at the repair code again and it seems to me that the repairing node will get all
data for its range from all its neighbors.

The repaired node is supposed to get only data from it's
neighbors for rows it is not in sync with. That is all supposed
to depend on how much the node is out of sync compared to
the other nodes.

Now there is a number of things that could make it repair more
that what you would hope. For instance:
  1) even if one column is different for a row, the full row is
      repaired. If you have a small number of huge rows, that
      can amount for quite some data useless transfered.
  2) The other one is that the merkle tree (that allows to say
      whether 2 rows are in sync) doesn't necessarily have one
      hash by row, so in theory one column not in sync may imply
      the repair of more than one row.
  3) https://issues.apache.org/jira/browse/CASSANDRA-2324 (which
      is fixed in 0.8)

Fortunately, the chance to get hit by 1) is proportionally inverse
to the change of getting hit by 2) and vice versa.

Anyway, the kind of excess data your seeing is not something
I would expect unless the node is really completely out of sync
with all the other nodes.
So in the light of this, do you have more info on your own case ?
(do you lots of small row, few of large ones ? Did you expected
the node to be widely out of sync with the other nodes ? Etc..)


--
Sylvain

>
> Is that true and if so is it the intended behavior? If so one would rather need 5-6 times
of disc space given that compactions that need to run after the sstable rebuild also need
temp disc space.
>
> Cheers,
> Daniel

Mime
View raw message