Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A89048E1F for ; Thu, 21 Jul 2011 02:12:42 +0000 (UTC) Received: (qmail 29896 invoked by uid 500); 21 Jul 2011 02:12:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 29852 invoked by uid 500); 21 Jul 2011 02:12:39 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 29844 invoked by uid 99); 21 Jul 2011 02:12:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 02:12:39 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of springrider@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2011 02:12:33 +0000 Received: by vws12 with SMTP id 12so731166vws.31 for ; Wed, 20 Jul 2011 19:12:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=h9HJKKQl38krw1477egQ79g54NlM0Y1BPuRhJTot5Yc=; b=a9gDr+sFwhoTuMNiHU2yIVNbV6HLQmeEWqA1RJgLxEjuF0vf1WvGBvNaLt6634i8Sg OUa4cfDNIgkGjEDSPX8MWLECNycNrl8gWyDY/yIMFrtdc5iuVajcifrbmIV2t214/+hY kBDAF2GrF2iphEMJJxTmFFQ/1gT9QZqPI7UkE= Received: by 10.52.22.9 with SMTP id z9mr9361203vde.187.1311214332325; Wed, 20 Jul 2011 19:12:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.156.228 with HTTP; Wed, 20 Jul 2011 19:11:51 -0700 (PDT) In-Reply-To: References: From: Yan Chunlu Date: Thu, 21 Jul 2011 10:11:51 +0800 Message-ID: Subject: Re: node repair eat up all disk io and slow down entire cluster(3 nodes) To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec5014dd780b02d04a88ae094 --bcaec5014dd780b02d04a88ae094 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable thank you very much for the help, I will try to adjust minor compaction and also dealing with single CF at a time. On Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton wrot= e: > If you have never run repair also check the section on repair on this pag= e > http://wiki.apache.org/cassandra/Operations About how frequently it shoul= d > be run. > > There is an issue where repair can stream too much data, and this can lea= d > to excessive disk use. > > My non scientific approach to the never run repair before problem is to > repair a single CF at a time, starting with the small ones that are less > likely to have differences as they will stream the smallest amount of dat= a. > > If you really want to conserve disk IO during the repair consider disabli= ng > the minor compaction by setting the min and max thresholds to 0 via node > tool. > > hope that helps. > > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 20/07/2011, at 11:46 PM, Yan Chunlu wrote: > > just found this: > > https://issues.apache.org/jira/browse/CASSANDRA-2156 > > but seems only available to 0.8 and people submitted a patch for 0.6, I a= m > using 0.7.4, do I need to dig into the code and make my own patch? > > does add compaction throttle solve the io problem? thanks! > > On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu < > springrider@gmail.com> wrote: > >> at the beginning of using cassandra, I have no idea that I should run >> "node repair" frequently, so basically, I have 3 nodes with RF=3D3 and h= ave >> not run node repair for months, the data size is 20G. >> >> the problem is when I start running node repair now, it eat up all disk = io >> and the server load became 20+ and increasing, the worst thing is, the >> entire cluster has slowed down and can not handle request. so I have to = stop >> it immediately because it make my web service unavailable. >> >> the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G >> memory, with Western Digital WD RE3 WD1002FBYS SATA disk. >> >> I really have no idea what to do now, as currently I have already found >> some data loss, any suggestions would be appreciated. >> > > > > -- > =E3=C6=B4=BA=C2=B7 > > --=20 =E3=C6=B4=BA=C2=B7 --bcaec5014dd780b02d04a88ae094 Content-Type: text/html; charset=GB2312 Content-Transfer-Encoding: quoted-printable thank you very much for the help, I will try to adjust minor compaction and= also dealing with single CF at a time.

O= n Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton <aaron@thelastpickle.com> = wrote:
If you have n= ever run repair also check the section on repair on this page 
http:= //wiki.apache.org/cassandra/Operations About how frequently it should b= e run.

There is an issue where repair can stream too much data= , and this can lead to excessive disk use.

My non = scientific approach to the never run repair before problem is to repair a s= ingle CF at a time, starting with the small ones that are less likely to ha= ve differences as they will stream the smallest amount of data. 

If you really want to conserve disk IO during the repai= r consider disabling the minor compaction by setting the min and max thresh= olds to 0 via node tool.

hope that helps.


-----------------
Aaron Morton
=
Freelance Cassandra Developer
@aaron= morton

On 20/07/2011, at 1= 1:46 PM, Yan Chunlu <springrider@gmail.com> wrote:

just found this:

but seems only available to 0.8 and people submitted a = patch for 0.6, I am using 0.7.4, do I need to dig into the code and make my= own patch?

does add compaction throttle solve the io problem? &nbs= p;thanks!

On Wed, Jul 20, 2011 at 4:44 PM= , Yan Chunlu <springrider@gmail.com> wrote:
at the beginning of using cassandra, I have = no idea that I should run "node repair" frequently, so basically,= I have 3 nodes with RF=3D3 and have not run node repair for months, the da= ta size is 20G.

the problem is when I start running node repair now, i= t eat up all disk io and the server load became 20+ and increasing, the wor= st thing is, the entire cluster has slowed down and can not handle request.= so I have to stop it immediately because it make my web service unavailab= le.

the server has Intel Xeon-= Lynnfield 3470-Quadcore [2.93GHz] and 8G memory, with Western Digital WD RE3 WD1002FBYS SATA disk.

I really have no id= ea what to do now, as currently I have already found some data loss, any su= ggestions would be appreciated.



--
=E3=C6=B4=BA=C2=B7



--
=E3=C6=B4=BA=C2=B7
--bcaec5014dd780b02d04a88ae094--