From user-return-26662-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Jun 4 14:03:22 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 64A4F9876 for ; Mon, 4 Jun 2012 14:03:22 +0000 (UTC) Received: (qmail 97652 invoked by uid 500); 4 Jun 2012 14:03:19 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97629 invoked by uid 500); 4 Jun 2012 14:03:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97621 invoked by uid 99); 4 Jun 2012 14:03:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jun 2012 14:03:19 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Viktor.Jevdokimov@adform.com designates 86.58.139.11 as permitted sender) Received: from [86.58.139.11] (HELO mail1.adform.com) (86.58.139.11) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jun 2012 14:03:13 +0000 Received: from dkexc002.adform.com ([10.0.19.5]) by mail1.adform.com with XWall v3.47 ; Mon, 4 Jun 2012 17:02:53 +0300 Received: from DKEXC001.adform.com (10.0.8.3) by DKEXC002.adform.com (10.0.19.5) with Microsoft SMTP Server (TLS) id 14.2.283.3; Mon, 4 Jun 2012 17:02:52 +0300 Received: from DKEXC002.adform.com ([fe80::a869:727c:285c:fd3]) by DKEXC001.adform.com ([fe80::6d0c:ccd8:b690:688a%18]) with mapi id 14.02.0247.003; Mon, 4 Jun 2012 17:02:51 +0300 From: Viktor Jevdokimov To: "user@cassandra.apache.org" Subject: RE: repair Thread-Topic: repair Thread-Index: AQHNQkTHt5NuP9ep4E2rTnP3GP8pQ5bp1WuAgAACkQCAAAGJAIAAB3mAgABMM5A= Date: Mon, 4 Jun 2012 14:02:51 +0000 Message-ID: <2C85E14562B39345BCCAD90B8E7955C914D1B2@DKEXC002.adform.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [192.168.0.95] Content-Type: multipart/related; boundary="_004_2C85E14562B39345BCCAD90B8E7955C914D1B2DKEXC002adformcom_"; type="multipart/alternative" MIME-Version: 1.0 X-XWALL-BCKS: auto --_004_2C85E14562B39345BCCAD90B8E7955C914D1B2DKEXC002adformcom_ Content-Type: multipart/alternative; boundary="_000_2C85E14562B39345BCCAD90B8E7955C914D1B2DKEXC002adformcom_" --_000_2C85E14562B39345BCCAD90B8E7955C914D1B2DKEXC002adformcom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Why without -PR when recovering from crash? Repair without -PR runs full repair of the cluster, the node which receives= a command is a repair controller, ALL nodes synchronizes replicas at the s= ame time, streaming data between each other. The problems may arise: =B7 When streaming hangs (it tends to hang even on a stable network= ), repair session hangs (any version does re-stream?) =B7 Network will be highly saturated =B7 In case of high inconsistency some nodes may receive a lot of d= ata, disk usage much more than 2x (depends on RF) =B7 A lot of compactions will be pending IMO, best way to run repair is from script with -PR for single CF from sing= le node at a time and monitoring progress, like: repair -pr node1 ks1 cf1 repair -pr node2 ks1 cf1 repair -pr node3 ks1 cf1 repair -pr node1 ks1 cf2 repair -pr node2 ks1 cf2 repair -pr node3 ks1 cf2 With some progress or other control in between, your choice. Use repair with care, do not let your cluster go down. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: Viktor.Jevdokimov@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider What is Adform: watch this short video [Adform News] Disclaimer: The information contained in this message and attachments is in= tended solely for the attention and use of the named addressee and may be c= onfidential. If you are not the intended recipient, you are reminded that t= he information remains the property of the sender. You must not use, disclo= se, distribute, copy, print or rely on this e-mail. If you have received th= is message in error, please contact the sender immediately and irrevocably = delete this message and any copies. From: R. Verlangen [mailto:robin@us2.nl] Sent: Monday, June 04, 2012 15:17 To: user@cassandra.apache.org Subject: Re: repair The "repair -pr" only repairs the nodes primary range: so is only usefull i= n day to day use. When you're recovering from a crash use it without -pr. 2012/6/4 Romain HARDOUIN > Run "repair -pr" in your cron. Tamar Fraenkel > a =E9crit = sur 04/06/2012 13:44:32 : > Thanks. > > I actually did just that with cron jobs running on different hours. > > I asked the question because I saw that when one of the logs was > running the repair, all nodes logged some repair related entries in > /var/log/cassandra/system.log > > Thanks again, > Tamar Fraenkel > Senior Software Engineer, TOK Media -- With kind regards, Robin Verlangen Software engineer W www.robinverlangen.nl E robin@us2.nl Disclaimer: The information contained in this message and attachments is in= tended solely for the attention and use of the named addressee and may be c= onfidential. If you are not the intended recipient, you are reminded that t= he information remains the property of the sender. You must not use, disclo= se, distribute, copy, print or rely on this e-mail. If you have received th= is message in error, please contact the sender immediately and irrevocably = delete this message and any copies. --_000_2C85E14562B39345BCCAD90B8E7955C914D1B2DKEXC002adformcom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Why without –PR whe= n recovering from crash?

 <= /p>

Repair without –PR = runs full repair of the cluster, the node which receives a command is a rep= air controller, ALL nodes synchronizes replicas at the same time, streaming data between each other.

The problems may arise:

=B7      &nb= sp;  When streaming ha= ngs (it tends to hang even on a stable network), repair session hangs (any = version does re-stream?)

=B7      &nb= sp;  Network will be h= ighly saturated

=B7      &nb= sp;  In case of high i= nconsistency some nodes may receive a lot of data, disk usage much more tha= n 2x (depends on RF)

=B7      &nb= sp;  A lot of compacti= ons will be pending

 <= /p>

IMO, best way to run repa= ir is from script with –PR for single CF from single node at a time a= nd monitoring progress, like:

repair -pr node1 ks1 cf1<= o:p>

repair -pr node2 ks1 cf1<= o:p>

repair -pr node3 ks1 cf1<= o:p>

repair -pr node1 ks1 cf2<= o:p>

repair -pr node2 ks1 cf2<= o:p>

repair -pr node3 ks1 cf2<= o:p>

With some progress or oth= er control in between, your choice.

 <= /p>

Use repair with care, do = not let your cluster go down.

 <= /p>

 <= /p>

 <= /p>

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider

Disclaimer: The information contained in this message and attachments is in= tended solely for the attention and use of the named addressee and may be c= onfidential. If you are not the intended recipient, you are reminded that t= he information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely= on this e-mail. If you have received this message in error, please contact= the sender immediately and irrevocably delete this message and any copies.

From: R. Verla= ngen [mailto:robin@us2.nl]
Sent: Monday, June 04, 2012 15:17
To: user@cassandra.apache.org
Subject: Re: repair

 

The "repair -pr&= quot; only repairs the nodes primary range: so is only usefull in day to da= y use. When you're recovering from a crash use it without -pr.

2012/6/4 Romain HARDOUIN <romain.hardouin@urssaf.fr>


Run &q= uot;repair -pr" in your cron.

Tamar Fraenkel <tamar@tok-media.com> a =E9crit= sur 04/06/2012 13:44:32 :

> Thanks. 

>
> I actually did just that with cron jobs running on different hours= .

>
> I asked the question because I saw that when one of the logs was <= /tt>
> running the repair, all nodes logged some repair related entries i= n
> /var/log/cassandra/system.log

>
> Thanks again,

> Tamar Fraenkel <= span style=3D"font-size:10.0pt;font-family:"Courier New"">
> Senior Software Engineer, TOK Media 



 

--
With kind regards,

 

Robin Verlangen

Software engineer

 

 

Disclaimer: The inform= ation contained in this message and attachments is intended solely for the = attention and use of the named addressee and may be confidential. If you ar= e not the intended recipient, you are reminded that the information remains the property of the sender. You must= not use, disclose, distribute, copy, print or rely on this e-mail. If you = have received this message in error, please contact the sender immediately = and irrevocably delete this message and any copies.

 

--_000_2C85E14562B39345BCCAD90B8E7955C914D1B2DKEXC002adformcom_-- --_004_2C85E14562B39345BCCAD90B8E7955C914D1B2DKEXC002adformcom_ Content-Type: image/png; name="signature-logo29.png" Content-Description: signature-logo29.png Content-Disposition: inline; filename="signature-logo29.png"; size=2786; creation-date="Mon, 04 Jun 2012 14:02:52 GMT"; modification-date="Mon, 04 Jun 2012 14:02:52 GMT" Content-ID: Content-Transfer-Encoding: base64 iVBORw0KGgoAAAANSUhEUgAAAGgAAAAcCAYAAACNr/p2AAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJ bWFnZVJlYWR5ccllPAAAAyJpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdp bj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6 eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEz NDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJo dHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlw dGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAv IiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RS ZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpD cmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBNYWNpbnRvc2giIHhtcE1NOkluc3RhbmNl SUQ9InhtcC5paWQ6Qzk5QzVCQzdEQjAyMTFFMEE0NDFCODQ4MzMzMkI3NzciIHhtcE1NOkRvY3Vt ZW50SUQ9InhtcC5kaWQ6Qzk5QzVCQzhEQjAyMTFFMEE0NDFCODQ4MzMzMkI3NzciPiA8eG1wTU06 RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDpDOTlDNUJDNURCMDIxMUUwQTQ0 MUI4NDgzMzMyQjc3NyIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDpDOTlDNUJDNkRCMDIxMUUw QTQ0MUI4NDgzMzMyQjc3NyIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1w bWV0YT4gPD94cGFja2V0IGVuZD0iciI/PqhJYX4AAAdWSURBVHja7FpNbFRVFD73zUwLpWlRKOFH qAtNGA02WkNiQssCjSyURDYu5CdRAi6EQGBDEWKCNexA0RjQTQth4WKamqhdSAxtYgK2IvGnDZDI VEFgan9hhrYz73q/O3Ond27fm3nTQnmGnuS1782c+96953vnnO+cO4xzTjMh8XicR6NRqq6uprKy MpZPNxaL8a6uLoyR1xhTW1vL6BGU4Ew9qLGxkQBQVVUVzrkTSAAxEolQW1vbpPFiHG9oaMD/Rwoo ayYeIoCR3pPxDlLnphw9etQRHDWupaVl1oMeUHgrqNPe3s67u7tNr5HAKBFe5/mZPBnlfOwS4dCF ldSkj2A1mwWoCOnp6ckB4sCBA8g9DGGvs7OT+vr6aP369QVBsUeOE4+34jwTI+anASmtT+sAtLvN xO0hbs19nayKXWwWIA+iewpIAcDJgMXq6+sLAzN0mOw7zROxu3wLWfM2E5uz1hUAARZPDeyTQOXT mwVoGpIYaebBwX1E9mA6hM1ZS4EFX3oKYcKzWEB4lx3/mnNxWGUbmO8BAsVFLkBIgSxcuJDC4bBn 9iTGZnMJQpWgx3nptHqOnqdwjvvouk70PB7bxkNxzWsqD1Jg/kGpMzzO+YWYTd2DNl3os7M6FSGi lZUWvbw0QOH5ltQFMPAmfu8cd/Mkc12wifLyqZYKXnSYqoOgfPLkSTITtRKEmU2bNrnWMOJB/PTp 0zmhSgkWo98X+UVRb68Coxw7diz7/AEBTrkGDrxGhDX53fU45wBGAEDLylh2vhdiKd4STVEkmpTX q6sCtDMcxH+Wj4HCLk7MEwbdvn17DlDImSdOnIA9nEoFaUPYA7YSxMhVRwElAcIkYKxCbAsTgnFN kMTDuBs9dhLcA4Ch5ilGME4sjkVv7+VLE8cdPceTlw/afH/XmPQuSENNCW19KsicwPFiF5QHiDAA R9V7+cRkp04C4IVTMFkHwVDmJHATHMaEJxkVnuMEDsDMJwh9xdDmDHGg3uFWviA+AY7MOUWAIz1a hLbm+lJ4l7z+6NIYNV1NTmqp4C037eK0LuUt0DfBcdI3wTHtDIHXIqoFndAFenhTnVwcbgkX1Beh C74TdJgpd3dzZYQF3FfrNGRjPEKA8BZHow/8uZcv0crrQOX7U0q+FSHGjrxYwje3j2ZBEnmJq5CI dethGS/Ujh07ZPTQ1wUAEP5hTH2devhzswNsvXv37myINCMRCnO51I0bN8oJ4EFwUQWOMiS+1xM4 Jq8Wob8N0FPgKIosJsnyEYVi5Nf+Jv4E9eZ4z3ToMXIP8pSST/8Yn+QVShQ4+rqEQZmwF8O1mbv1 3KT08eK56WRe7hwd3NNSIOzZs0fexIkEmKFIub05KbdCslCB6VUu9x2mCs17UL9MV9Yt+Tt7rsiD Cuc6ySnU4FVsVH+xnciScZ1XBy+/VagDDXrZ0dHhqYVTaBHTkVsJQYPHo5PaNtOVJXN7c67B9Ly2 px5KHYSkD/eGdxRiGjMptxO/0CJjtvej+h9NDf0/enHIJ6gz/ASKLkNjUbpn3//7Ru+cE39f9S1A lvIa1Bhu4DjRwIfSr0sZvTRR+U8P9Gu8ZygX9WXzLH95kKp8TUDq6upkwkIiQx4qpup/UPKXKFVG OaNSlm12SiY3VTl7fS/9k3hzAhyRQvXOgy8AQitfT4ig2mBzfnX5q+Ocni1JTw9bC1bFrilT9p// vUi9d09lP0N/znchTqeHmSLVl+CsKE97yg9xeFHGg5JRsoc/4VMB55ved+hi//6cz7c+HfJnDiok bpTT5PXYFXXsfRn1UqE2kAtArLLkSQnOVyN2FqTU0IdF5aKfYh9LcG4m6ujqyFvZz98Lh3wX3iRA ehEKkhCJRLjRN+Jm/00VWPivj0c7w9wmAGj6eC9Fn5usenxLmnKn0iANI7/bg5S89QqlBg9zce4K FPZ8zlyu5cg7/aOr6OzNM3pvjnY+E/Lnhh3aMHo/DcYEq8Nbrn7gYeYos0ugAIAeyIQYK8djrNk8 1NtGxcqaxYfY2RspXmo1SpC+EAwM+eiFOYwWYUdVHFbZBk6hGq1+ukRXBJX+LTEgAQU43934lsbs yiw4aJz6tg5Cm1zkHa43Lp0Mq0KT3ijNGJzpm1n5xgNMp/ZGMbJt5Qe07/xyqnnsXXn9+xiXB2Q5 dgxGWsVZa8bTJvKVJBgipJ3vO5IFB/tBn71UIhunvq6DsO+Axl2+9n+mw+wYntB5ztdvw30BLJqB bjp6XsrXXEWeOPT823Qt3kZ3kism0XD9UOAg38BrOm5/LsERgMicc6q+lLmBo8/HzLVTnb+Xe+qf Q5/pvyxFTYSWOLzB2Ib19OYjX6FvZ24NIyx6yTuKZOCFKaSLLe2mK0n68VYTzQt20OK5HVQeTPfV AEL/6HN0894ait59TYY1CIB5ozog2ZoXQlDMfDJkSP7+L996kT6ggxfaTUe/D5upn/4+KAFQ399I kchNcof0ejx3Pcgx4UqL1okaZ3WV5etw5iT/CTAAg79IPMduPdQAAAAASUVORK5CYII= --_004_2C85E14562B39345BCCAD90B8E7955C914D1B2DKEXC002adformcom_--