Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E896F701B for ; Sun, 7 Aug 2011 08:31:03 +0000 (UTC) Received: (qmail 20687 invoked by uid 500); 7 Aug 2011 08:31:00 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 20538 invoked by uid 500); 7 Aug 2011 08:30:58 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 20528 invoked by uid 99); 7 Aug 2011 08:30:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Aug 2011 08:30:57 +0000 X-ASF-Spam-Status: No, hits=3.9 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,TRACKER_ID,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of springrider@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-ew0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Aug 2011 08:30:50 +0000 Received: by mail-ew0-f44.google.com with SMTP id 19so634462ewy.31 for ; Sun, 07 Aug 2011 01:30:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=7B1tumPuO1X4l2qWfqWsQlB5FbnrtisOU4CfTKEa+Dw=; b=xXBf9X/D1zWVnFDnCR/Qj7NjLIl7BnSf3Xv15j+am43AVNtsJftyfMJbfRHqauCA2J 9gK/t3nJ7ZnwZ2TN624guFM8jyBxgv1YtUaGnN+Rj6SEKNp7lLJBpGlNSJ4AqzfDzLRx RCdTBAa4AnlR/E08QXc9fRlAGCH9PNBkQ/wRk= Received: by 10.213.30.7 with SMTP id s7mr1271213ebc.63.1312705830152; Sun, 07 Aug 2011 01:30:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.8.202 with HTTP; Sun, 7 Aug 2011 01:30:10 -0700 (PDT) In-Reply-To: <2D1FC236-B398-4447-9639-B7A91E961B9A@thelastpickle.com> References: <1312132674399-6638649.post@n2.nabble.com> <1312153659782-6639317.post@n2.nabble.com> <2D1FC236-B398-4447-9639-B7A91E961B9A@thelastpickle.com> From: Yan Chunlu Date: Sun, 7 Aug 2011 16:30:10 +0800 Message-ID: Subject: Re: how to solve one node is in heavy load in unbalanced cluster To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015174bdf2eb36bfc04a9e624c5 X-Virus-Checked: Checked by ClamAV on apache.org --0015174bdf2eb36bfc04a9e624c5 Content-Type: text/plain; charset=ISO-8859-1 thanks for the confirmation aaron! On Sun, Aug 7, 2011 at 4:01 PM, aaron morton wrote: > move first removes the node from the cluster, then adds it back > http://wiki.apache.org/cassandra/Operations#Moving_nodes > > If you have 3 nodes and rf 3, removing the node will result in the error > you are seeing. There is not enough nodes in the cluster to implement the > replication factor. > > You can drop the RF down to 2 temporarily and then put it back to 3 later, > see http://wiki.apache.org/cassandra/Operations#Replication > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 5 Aug 2011, at 03:39, Yan Chunlu wrote: > > hi, any help? thanks! > > On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu wrote: > >> forgot to mention I am using cassandra 0.7.4 >> >> >> On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu wrote: >> >>> also nothing happens about the streaming: >>> >>> nodetool -h node3 netstats >>> Mode: Normal >>> Not sending any streams. >>> Nothing streaming from /10.28.53.11 >>> Pool Name Active Pending Completed >>> Commands n/a 0 165086750 >>> Responses n/a 0 99372520 >>> >>> >>> >>> On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu wrote: >>> >>>> sorry the ring info should be this: >>>> >>>> nodetool -h node3 ring >>>> Address Status State Load Owns Token >>>> >>>> >>>> 84944475733633104818662955375549269696 >>>> node1 Up Normal 13.18 GB 81.09% >>>> 52773518586096316348543097376923124102 >>>> node2 Up Normal 22.85 GB 10.48% >>>> 70597222385644499881390884416714081360 >>>> node3 Up Leaving 25.44 GB 8.43% >>>> 84944475733633104818662955375549269696 >>>> >>>> >>>> >>>> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu wrote: >>>> >>>>> I have tried the nodetool move but get the following error.... >>>>> >>>>> node3:~# nodetool -h node3 move 0 >>>>> Exception in thread "main" java.lang.IllegalStateException: replication >>>>> factor (3) exceeds number of endpoints (2) >>>>> at >>>>> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60) >>>>> at >>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930) >>>>> at >>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896) >>>>> at >>>>> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596) >>>>> at >>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1734) >>>>> at >>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1709) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> at >>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) >>>>> at >>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) >>>>> at >>>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) >>>>> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) >>>>> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) >>>>> at >>>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) >>>>> at >>>>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) >>>>> at >>>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) >>>>> at >>>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) >>>>> at >>>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) >>>>> at >>>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) >>>>> at >>>>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) >>>>> at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) >>>>> at sun.rmi.transport.Transport$1.run(Transport.java:159) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at sun.rmi.transport.Transport.serviceCall(Transport.java:155) >>>>> at >>>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) >>>>> at >>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) >>>>> at >>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>> at java.lang.Thread.run(Thread.java:662) >>>>> >>>>> >>>>> >>>>> >>>>> then nodetool shows the node is leaving.... >>>>> >>>>> >>>>> nodetool -h node3 ring >>>>> Address Status State Load Owns Token >>>>> >>>>> >>>>> 84944475733633104818662955375549269696 >>>>> node1 Up Normal 13.18 GB 81.09% >>>>> 52773518586096316348543097376923124102 >>>>> node2 Up Normal 22.85 GB 10.48% >>>>> 70597222385644499881390884416714081360 >>>>> node3 Up Leaving 25.44 GB 8.43% >>>>> 84944475733633104818662955375549269696 >>>>> >>>>> the log didn't show any error message neither anything abnormal. is >>>>> there something wrong? >>>>> >>>>> >>>>> I used to have RF=2, and changed it to RF=3 using cassandra-cli. >>>>> >>>>> >>>>> On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu wrote: >>>>> >>>>>> thanks a lot! I will try the "move". >>>>>> >>>>>> >>>>>> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra wrote: >>>>>> >>>>>>> >>>>>>> springrider wrote: >>>>>>> > >>>>>>> > is that okay to do nodetool move before a completely repair? >>>>>>> > >>>>>>> > using this equation? >>>>>>> > def tokens(nodes): >>>>>>> > >>>>>>> > - for x in xrange(nodes): >>>>>>> > - print 2 ** 127 / nodes * x >>>>>>> > >>>>>>> >>>>>>> Yes use that logic to get the tokens. I think it's safe to run move >>>>>>> first >>>>>>> and reair later. You are moving some nodes data as is so it's no >>>>>>> worse than >>>>>>> what you have right now. >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html >>>>>>> Sent from the cassandra-user@incubator.apache.org mailing list >>>>>>> archive at Nabble.com. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > > --0015174bdf2eb36bfc04a9e624c5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable thanks for the confirmation aaron!

On Sun= , Aug 7, 2011 at 4:01 PM, aaron morton <aaron@thelastpickle.com> wrote:<= br>
move fi= rst removes the node from the cluster, then adds it back=A0http= ://wiki.apache.org/cassandra/Operations#Moving_nodes

If you have 3 nodes and rf 3, removing the node will result = in the error you are seeing. There is not enough nodes in the cluster to im= plement the replication factor.=A0

You can drop th= e RF down to 2 temporarily and then put it back to 3 later, see=A0http://wiki.apache.org/cassandra/Operations#Replication

Cheers

-----------------
Aaron Morton
Freelance Cassand= ra Developer
@aaronmorton

On 5 Aug 2011, at 03:= 39, Yan Chunlu wrote:

hi, any=A0 help? t= hanks!

On Thu, Aug 4, 2011 at 5:02 AM, Ya= n Chunlu <springrider@gmail.com> wrote:
forgot to mention I am using cassandra 0.7.4


On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu <s= pringrider@gmail.com> wrote:
also nothing happens ab= out the streaming:

nodetool -h node3 netstats
Mode: Normal
Not sending any streams.
=A0Nothing streaming from /10.28.53.11
Pool Name =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Active =A0 Pending = =A0 =A0 =A0Completed
Commands =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0n/a =A0 =A0 =A0 =A0 0 =A0 =A0 =A0165086750
Responses = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 n/a =A0 =A0 =A0 =A0 0 =A0 =A0 = =A0 99372520



On Thu, Aug 4, 2011 at = 4:56 PM, Yan Chunlu <springrider@gmail.com> wrote:
sorry the ring info should be this:

nodetool -h nod= e3 ring
Address =A0 =A0 =A0 =A0 Status State =A0 Load = =A0 =A0 =A0 =A0 =A0 =A0Owns =A0 =A0Token =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A084944475733633104818662955375549269696 = =A0 =A0 =A0
node1 =A0 =A0 =A0Up =A0 =A0 Normal =A013.18 GB =A0 =A0 =A0= =A081.09% =A052773518586096316348543097376923124102 =A0 =A0 =A0
= node2 =A0 =A0 Up =A0 =A0 Normal =A022.85 GB =A0 =A0 =A0 =A010.48% =A0705972= 22385644499881390884416714081360 =A0 =A0 =A0
node3 =A0 =A0 =A0Up =A0 =A0 Leaving 25.44 GB =A0 =A0 =A0 =A08.43%= =A0 84944475733633104818662955375549269696=A0



On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu &= lt;springrider@g= mail.com> wrote:
<= /div>
I have tried the nodetool move but get the following error....

node3:~# nodetool -h node3 mov= e 0
Exception in thread "main" java.lang.IllegalStateEx= ception: replication factor (3) exceeds number of endpoints (2)
at org.apache.cassandra.l= ocator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
at org.apache.cassandra= .service.StorageService.calculatePendingRanges(StorageService.java:930)
at org.apache.cassandra.s= ervice.StorageService.calculatePendingRanges(StorageService.java:896)
=
at org.apache.cassandra.s= ervice.StorageService.startLeaving(StorageService.java:1596)
at org.apache.cassandra.s= ervice.StorageService.move(StorageService.java:1734)
at org.apache.cassandra.service.StorageSe= rvice.move(StorageService.java:1709)
at sun.reflect.NativeMeth= odAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMeth= odAccessorImpl.java:39)
at sun.reflect.Delegating= MethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
<= span style=3D"white-space:pre-wrap"> at java.lang.reflect.Method.inv= oke(Method.java:597)
at com.sun.jmx.mbeanserve= r.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at com.sun.jmx.mbeanser= ver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at com.sun.jmx.mbeanserve= r.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface= .invoke(PerInterface.java:120)
at com.sun.jmx.mbeanserve= r.MBeanSupport.invoke(MBeanSupport.java:262)
at com.sun.jmx.interceptor.DefaultMBeanServerInter= ceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at com.sun.jmx.mbeanserve= r.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at javax.management.remote.rmi.RMIConnectionIm= pl.doOperation(RMIConnectionImpl.java:1427)
at javax.management.remot= e.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at javax.management.remote.rmi.RM= IConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at javax.management.remot= e.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)<= /div>
at javax.management.= remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
at sun.reflect.GeneratedM= ethodAccessor108.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Dele= gatingMethodAccessorImpl.java:25)
at java.lang.reflect.Meth= od.invoke(Method.java:597)
= at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:30= 5)
at sun.rmi.transport.Tran= sport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Tran= sport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPT= ransport.java:535)
at sun.rmi.transport.tcp.= TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at sun.rmi.transport.tcp.TCPTranspor= t$ConnectionHandler.run(TCPTransport.java:649)
at java.util.concurrent.T= hreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoo= lExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(T= hread.java:662)




=
then nodetool shows the node is leaving....


nodetool -h node3 ring<= /div>
Address =A0 =A0 =A0 =A0 Status State =A0 Load =A0 =A0 = =A0 =A0 =A0 =A0Owns =A0 =A0Token =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0
=A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A084944475733633104818662955375549269696 =A0 =A0 =A0
node1 =A0 =A0 =A0Up =A0 =A0 Normal =A013.18 GB =A0 =A0 =A0= =A081.09% =A052773518586096316348543097376923124102 =A0 =A0 =A0
= node2 =A0 =A0 Up =A0 =A0 Normal =A022.85 GB =A0 =A0 =A0 =A010.48% =A0705972= 22385644499881390884416714081360 =A0 =A0 =A0
node3 =A0 =A0 =A0Up =A0 =A0 Leaving 25.44 GB =A0 =A0 =A0 =A08.43% =A0 84944= 475733633104818662955375549269696=A0

the log didn't show any error mess= age neither anything abnormal. =A0is there something wrong?

<= /div>

I used to have RF=3D2, and changed it to RF=3D3 us= ing cassandra-cli.


On Mon, Aug 1, 2011 at 10:22 = AM, Yan Chunlu <springrider@gmail.com> wrote:
thanks a lot! I will try the "move".


=
On Mon, Aug 1, 2011 at 7:07 AM, mcasandra <mohitanchlia@gmail.com> wrote:

springrider wrote:
>
> is that okay to do nodetool move before a completely repair?
>
> using this equation?
> def tokens(nodes):
>
> =A0 =A0- for x in xrange(nodes):
> =A0 =A0 =A0 - print 2 ** 127 / nodes * x
>

Yes use that logic to get the tokens. I think it's safe to run mo= ve first
and reair later. You are moving some nodes data as is so it's no worse = than
what you have right now.

--
View this message in context: http://cassandra= -user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-i= n-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
Sent from the cassandra-user@incubator.apache.org= mailing list archive at Nabble.com.








--0015174bdf2eb36bfc04a9e624c5--