incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nimi Wariboko Jr <nimiwaribo...@gmail.com>
Subject Re: Data Loss/Missing With Cassandra
Date Sun, 09 Jun 2013 20:19:18 GMT
If I had to do a repair after upping the RF, than that is probably what caused the data loss.
Wish I had been more careful.

I'm guessing the data is irrevocably lost, I didn't make any any snapshots.

Would it be possible to figure out if only a certain part of the ring was effected? That would
be helpful in figuring out what data was lost.

I've done a full repair now, so I'm also guessing that inconsistent data is now completely
gone as well, right? 

On Sunday, June 9, 2013 at 10:37 AM, Edward Capriolo wrote:

> Sounds like your cluster got shufflef*cked.
> You said : "After we had gotten all the data moved over we decided to add 2 more nodes,
as well as up the RF to 2."
> 
> After your raise replication you have to run repair on all nodes. If you did not, and
then you proceeded to shuffle you will likely have a data loss.
> 
> If you did repair all nodes before the shuffle, I do not know then the shuffle must have
went wrong. If your reading at CL.ALL and still seeing inconsistencies that is bad. Possible
try raising the read repair chance to 100% and continue reading and see if the data becomes
consistent (though I do not know why repair would not do it).
> 
> 
> 
> 
> 
> On Sat, Jun 8, 2013 at 8:56 PM, Nimi Wariboko Jr <nimiwaribokoj@gmail.com (mailto:nimiwaribokoj@gmail.com)>
wrote:
> > Hi, 
> > 
> > We are seeing an issue where data that was written to the cluster is no longer accessible
after trying to expand the size of the cluster. I will try and provide as much information
as possible, I am just starting at with Cassandra and I'm not entirely sure what data is relevant.

> > 
> > All Cassandra nodes are 1.2.5, and each node has the same config. 
> > 
> > We started out moving our entire data set to a single cassandra node. This node
was initially set up with Initial Token : 0, as well as other default settings. After we had
gotten all the data moved over we decided to add 2 more nodes, as well as up the RF to 2.
We also decided to start using vnodes which meant setting num_tokens to 256 and removing the
initial token param. We then decided to run cassandra-shuffle as well. 
> > 
> > During cassandra-shuffle we started to notice some rows were disappearing then reappearing,
and other rows haven't come back at all. I decided to stop the shuffle and repair each node
then restart the cluster, however all the data hasn't come back. Note that this is CONSISTENCY
ALL 
> > 
> > Here is my `nodetool status` What is weird here is the token distribution 260-239-1.
I'm not an expert but I believe it should be 256-256-256, or at least add up to 768.
> > 
> > Datacenter: 129
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address       Load       Tokens  Owns   Host ID                            
  Rack
> > UN  10.129.196.4  371.56 GB  260     38.1%  cde6c3be-a066-47f2-abc2-b1d78bee0d7c
 196
> > UN  10.129.196.5  212.64 GB  239     61.5%  2cb24510-2f89-46b2-96b9-873f8e8e50da
 196
> > UN  10.129.196.6  256.05 GB  1       0.4%   ce8d4ea9-8106-44b3-a2dd-c0230eb53c94
 196
> > 
> > 
> > (http://pastebin.com/37SwNaGq)
> > 
> > And here is the opscenter ring view (http://imgur.com/VssmFlw) 
> > 
> > What also weird is the token count from nodetool -h [host] info differs from status.

> > 
> > Example:
> > root@cass1:~# nodetool -h cass1 info | grep Token
> > Token            : (invoke with -T/--tokens to see all 239 tokens)
> > root@cass1:~# nodetool -h cass2 info | grep Token
> > Token            : (invoke with -T/--tokens to see all 269 tokens)
> > root@cass1:~# nodetool -h cass3 info | grep Token
> > Token            : (invoke with -T/--tokens to see all 260 tokens)
> > 
> > 
> > (Full output: http://pastebin.com/2hxpArt0)
> > 
> > I believe it has something to do with the cluster not "seeing" all the tokens, but
I am not sure where to continue from here. I don't believe any data was lost there was no
power outage, and all the data should have been committed to disk before we added the two
other nodes. 
> > 
> > Thanks,
> > Nimi
> > nimiwaribokoj@gmail.com (mailto:nimiwaribokoj@gmail.com)
> > 
> 
> 
> 


Mime
View raw message