incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Unbalanced ring in Cassandra 0.8.4
Date Thu, 21 Jun 2012 18:45:02 GMT
>  Does cleanup only cleanup keys that no longer belong to that node. 
Yes.

I guess it could be an artefact of the bulk load. It's not been reported previously though.
Try the cleanup and see how it goes. 

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/06/2012, at 1:34 AM, Raj N wrote:

> Nick, thanks for the response. Does cleanup only cleanup keys that no longer belong to
that node. Just to add more color, when I bulk loaded all my data into these 6 nodes, all
of them had the same amount of data. After the first nodetool repair, the first node started
having more data than the rest of the cluster. And since then it has never come back down.
When I run cfstats on the node, the amount of data for every column family is almost 2 times
the the amount of data for other. This is true for the number of keys estimate as well. For
1 CF I see more than double the number of keys and that's the largest cf as well with 34 GB
data.
> 
> Thanks
> -Rajesh
> 
> On Wed, Jun 20, 2012 at 12:32 AM, Nick Bailey <nick@datastax.com> wrote:
> No. Cleanup will scan each sstable to remove data that is no longer
> owned by that specific node. It won't compact the sstables together
> however.
> 
> On Tue, Jun 19, 2012 at 11:11 PM, Raj N <raj.cassandra@gmail.com> wrote:
> > But wont that also run a major compaction which is not recommended anymore.
> >
> > -Raj
> >
> >
> > On Sun, Jun 17, 2012 at 11:58 PM, aaron morton <aaron@thelastpickle.com>
> > wrote:
> >>
> >> Assuming you have been running repair, it' can't hurt.
> >>
> >> Cheers
> >>
> >> -----------------
> >> Aaron Morton
> >> Freelance Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 17/06/2012, at 4:06 AM, Raj N wrote:
> >>
> >> Nick, do you think I should still run cleanup on the first node.
> >>
> >> -Rajesh
> >>
> >> On Fri, Jun 15, 2012 at 3:47 PM, Raj N <raj.cassandra@gmail.com> wrote:
> >>>
> >>> I did run nodetool move. But that was when I was setting up the cluster
> >>> which means I didn't have any data at that time.
> >>>
> >>> -Raj
> >>>
> >>>
> >>> On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey <nick@datastax.com> wrote:
> >>>>
> >>>> Did you start all your nodes at the correct tokens or did you balance
> >>>> by moving them? Moving nodes around won't delete unneeded data after
> >>>> the move is done.
> >>>>
> >>>> Try running 'nodetool cleanup' on all of your nodes.
> >>>>
> >>>> On Fri, Jun 15, 2012 at 12:24 PM, Raj N <raj.cassandra@gmail.com>
wrote:
> >>>> > Actually I am not worried about the percentage. Its the data I
am
> >>>> > concerned
> >>>> > about. Look at the first node. It has 102.07GB data. And the other
> >>>> > nodes
> >>>> > have around 60 GB(one has 69, but lets ignore that one). I am not
> >>>> > understanding why the first node has almost double the data.
> >>>> >
> >>>> > Thanks
> >>>> > -Raj
> >>>> >
> >>>> >
> >>>> > On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey <nick@datastax.com>
> >>>> > wrote:
> >>>> >>
> >>>> >> This is just a known problem with the nodetool output and multiple
> >>>> >> DCs. Your configuration is correct. The problem with nodetool
is
> >>>> >> fixed
> >>>> >> in 1.1.1
> >>>> >>
> >>>> >> https://issues.apache.org/jira/browse/CASSANDRA-3412
> >>>> >>
> >>>> >> On Fri, Jun 15, 2012 at 9:59 AM, Raj N <raj.cassandra@gmail.com>
> >>>> >> wrote:
> >>>> >> > Hi experts,
> >>>> >> >     I have a 6 node cluster across 2 DCs(DC1:3, DC2:3).
I have
> >>>> >> > assigned
> >>>> >> > tokens using the first strategy(adding 1) mentioned here
-
> >>>> >> >
> >>>> >> > http://wiki.apache.org/cassandra/Operations?#Token_selection
> >>>> >> >
> >>>> >> > But when I run nodetool ring on my cluster, this is the
result I
> >>>> >> > get -
> >>>> >> >
> >>>> >> > Address         DC  Rack  Status State   Load        Owns
   Token
> >>>> >> >
> >>>> >> >  113427455640312814857969558651062452225
> >>>> >> > 172.17.72.91    DC1 RAC13 Up     Normal  102.07 GB   33.33%
 0
> >>>> >> > 45.10.80.144    DC2 RAC5  Up     Normal  59.1 GB     0.00%
  1
> >>>> >> > 172.17.72.93    DC1 RAC18 Up     Normal  59.57 GB    33.33%
> >>>> >> >  56713727820156407428984779325531226112
> >>>> >> > 45.10.80.146    DC2 RAC7  Up     Normal  59.64 GB    0.00%
> >>>> >> > 56713727820156407428984779325531226113
> >>>> >> > 172.17.72.95    DC1 RAC19 Up     Normal  69.58 GB    33.33%
> >>>> >> >  113427455640312814857969558651062452224
> >>>> >> > 45.10.80.148    DC2 RAC9  Up     Normal  59.31 GB    0.00%
> >>>> >> > 113427455640312814857969558651062452225
> >>>> >> >
> >>>> >> >
> >>>> >> > As you can see the first node has considerably more load
than the
> >>>> >> > others(almost double) which is surprising since all these
are
> >>>> >> > replicas
> >>>> >> > of
> >>>> >> > each other. I am running Cassandra 0.8.4. Is there an
explanation
> >>>> >> > for
> >>>> >> > this
> >>>> >> > behaviour?
> >>>> >> > Could https://issues.apache.org/jira/browse/CASSANDRA-2433
be
> >>>> >> > the
> >>>> >> > cause for this?
> >>>> >> >
> >>>> >> > Thanks
> >>>> >> > -Raj
> >>>> >
> >>>> >
> >>>
> >>>
> >>
> >>
> >
> 


Mime
View raw message