incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject RE: Questions related to the data in SSTable files
Date Wed, 23 Oct 2013 00:17:06 GMT
Any way I can verify how often the system being "repaired"? I can ask another group who maintain
the Cassandra cluster. But do you mean that even the failed writes will be stored in the SSTable
files? 
I thought the Cassandra will use different storage to store that kind of data, as the regular
good data in memtable, then in the SSTable files.
Yong

Date: Tue, 22 Oct 2013 14:50:07 -0700
Subject: Re: Questions related to the data in SSTable files
From: rcoli@eventbrite.com
To: user@cassandra.apache.org

On Tue, Oct 22, 2013 at 2:29 PM, java8964 java8964 <java8964@hotmail.com> wrote:




1) In the data of full snapshot, I see more than 10% of duplication data. What I mean duplication
is that there are event_activities with the same (entity_1_id, entity_2_id, entity_3_id, entity_4_id,
created_on_timestamp, column_timestamp). I am surprised to see the high level duplication
data, especially even adding with the column_timestamp. As my understanding, the column_timestamp
is provided from the client when Cassandra store the column in the row key data. So if there
are some small amount of duplication, I can explain as application bug, or duplication comes
from the replication. But more than 10% is too much to explain this way.

Have you run "repair"? Do you regularly have hinted handoff kicking in due to down nodes or
dropped messages, such that failed writes are re-delivered as hints?
 =Rob
 		 	   		  
Mime
View raw message