incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haithem Jarraya <haithem.jarr...@struq.com>
Subject Re: Repair Hanging C* 1.2.4
Date Thu, 02 May 2013 12:58:56 GMT
Hi,

I am running the repair, 
nodetool repair mykeyspace

I do not specify any thing else.

Sent from my iPhone

On 2 May 2013, at 13:17, Yuki Morishita <mor.yuki@gmail.com> wrote:

> Hi,
> 
>> ERROR [Thread-12725] 2013-05-01 14:30:54,304 StorageService.java (line 2420) Repair
session failed:
>> 
>> java.lang.IllegalArgumentException: Requested range intersects a local range but
is not fully contained in one; this would lead to imprecise repair
>> 
> 
> This error means you are repairing the range that spreads across multiple (virtual) nodes.
> I think this won't happen unless you specify the repair range with -st and -et option.
> 
> How do you start repair?
> 
> --
> Yuki Morishita
> Sent with Airmail
> On May 2, 2013 at May 2, 2013, Haithem Jarraya (haithem.jarraya@struq.com) wrote:
> 
>> Hi All,
>> 
>>  
>> 
>> Cassandra repair has been a real pain for us and it’s holding back our migration
from mongo for quiet sometimes now.
>> 
>> We saw errors like this during the repair,
>> 
>>  INFO [AntiEntropyStage:1] 2013-05-01 14:30:54,300 AntiEntropyService.java (line
764) [repair #ed104480-b26a-11e2-af9b-05179fa66b76] mycolumnfamily is fully synced (1 remaining
column family to sync for this session)
>> 
>> ERROR [Thread-12725] 2013-05-01 14:30:54,304 StorageService.java (line 2420) Repair
session failed:
>> 
>> java.lang.IllegalArgumentException: Requested range intersects a local range but
is not fully contained in one; this would lead to imprecise repair
>> 
>>         at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:175)
>> 
>>         at org.apache.cassandra.service.AntiEntropyService$RepairSession.<init>(AntiEntropyService.java:621)
>> 
>>         at org.apache.cassandra.service.AntiEntropyService$RepairSession.<init>(AntiEntropyService.java:610)
>> 
>>         at org.apache.cassandra.service.AntiEntropyService.submitRepairSession(AntiEntropyService.java:127)
>> 
>>         at org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:2480)
>> 
>>         at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2416)
>> 
>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> 
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> 
>>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> 
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> 
>>         at java.lang.Thread.run(Thread.java:662)
>> 
>>  
>> 
>>  
>> 
>> Ok we might have gone beyond the GCGrades(again before repair do not complete) So
we ran scrub  in all node in parallel as it was suggested in this mailing list.
>> 
>> I am not sure if this can be the cause of the problem or not, but in reality we had
this issue of repair not completing and hanging from the day we started testing cassandra
1.2.2, same issue happening with every upgrade 1.2.3 and now 1.2.4.
>> 
>> I want a way to kick in the repair  if they hang or cancel the previous one without
restarting the cluster, we can’t afford to do that the day we go live.
>> 
>>  
>> 
>> Let me start by presenting our current configuration.
>> 
>> Data Centers:
>> 
>> 2 Data center (Amsterdam 6 nodes with RF of 3, Washington D.C with RF of 1)
>> 
>> 1 Key space with 3 column families ~= 100GB of data.
>> 
>> Each node running Cassandra 1.2.4 with Java6_update45 running centos 2.6 with 32GB
of RAM, 24 Cores @2.00GHZ, JNA v3.2.4 installed, 2 disk( 1 rotational for os and commit logs,
and 1 ssd for the data). We are getting really good read performances, 99% < 10ms,  95%
< 5ms.
>> 
>>  
>> 
>> nodetool status
>> 
>> Datacenter: ams01
>> 
>> =================
>> 
>> Status=Up/Down
>> 
>> |/ State=Normal/Leaving/Joining/Moving
>> 
>> --  Address       Load       Tokens  Owns   Host ID                             
 Rack
>> 
>> UN  x.x.x.23   34.04 GB   256     13.1%  4a7bc489-25af-4c20-80f8-499ffcb18e2d  RAC1
>> 
>> UN  x.x.x.79    28.53 GB   256     12.6%  98a1167f-cf75-4201-a454-695e0f7d2d72  RAC1
>> 
>> UN  x.x.x.78    41.31 GB   256     11.9%  62a418b5-3c38-4f66-874d-8138d6d565e5  RAC1
>> 
>> UN  x.x.x.66   54.41 GB   256     13.8%  ab564d16-4081-4866-b8ba-26461d9a93d7  RAC1
>> 
>> UN  x.x.x.91    45.92 GB   256     12.6%  2e1e7179-82e6-4ae6-b986-383acc9fc8a2  RAC1
>> 
>> UN  x.x.x.126  37.31 GB   256     11.8%  d4bed3b1-ffaf-4c68-b560-d270355c8c4b  RAC1
>> 
>> Datacenter: wdc01
>> 
>> =================
>> 
>> Status=Up/Down
>> 
>> |/ State=Normal/Leaving/Joining/Moving
>> 
>> --  Address       Load       Tokens  Owns   Host ID                             
 Rack
>> 
>> UN  x.x.x.144   30.64 GB   256     12.0%  1860011e-fa7c-4ce1-ad6b-c8a38a5ddd02  RAC1
>> 
>> UN  x.x.x.140   86.05 GB   256     12.3%  f3fa985d-5056-4ddc-b146-d02432c3a86e  RAC1
>> 
>>  
>> 
>> nodetool  status mykeyspace
>> 
>> Datacenter: ams01
>> 
>> =================
>> 
>> Status=Up/Down
>> 
>> |/ State=Normal/Leaving/Joining/Moving
>> 
>> --  Address       Load       Tokens  Owns (effective)  Host ID                  
            Rack
>> 
>> UN  x.x.x.66   54.41 GB   256     53.6%             ab564d16-4081-4866-b8ba-26461d9a93d7
 RAC1
>> 
>> UN  x.x.x.91    45.92 GB   256     52.1%             2e1e7179-82e6-4ae6-b986-383acc9fc8a2
 RAC1
>> 
>> UN  x.x.x.126  37.31 GB   256     47.9%             d4bed3b1-ffaf-4c68-b560-d270355c8c4b
 RAC1
>> 
>> UN  x.x.x.23   34.04 GB   256     50.9%             4a7bc489-25af-4c20-80f8-499ffcb18e2d
 RAC1
>> 
>> UN  x.x.x.79    28.53 GB   256     47.4%             98a1167f-cf75-4201-a454-695e0f7d2d72
 RAC1
>> 
>> UN  x.x.x.78    41.31 GB   256     48.0%             62a418b5-3c38-4f66-874d-8138d6d565e5
 RAC1
>> 
>> Datacenter: wdc01
>> 
>> =================

Mime
View raw message