incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Coli <rc...@eventbrite.com>
Subject Re: Questions related to the data in SSTable files
Date Tue, 22 Oct 2013 21:50:07 GMT
On Tue, Oct 22, 2013 at 2:29 PM, java8964 java8964 <java8964@hotmail.com>wrote:

> 1) In the data of full snapshot, I see more than 10% of duplication data.
> What I mean duplication is that there are event_activities with the same
> (entity_1_id, entity_2_id, entity_3_id, entity_4_id, created_on_timestamp,
> column_timestamp). I am surprised to see the high level duplication data,
> especially even adding with the column_timestamp. As my understanding, the
> column_timestamp is provided from the client when Cassandra store the
> column in the row key data. So if there are some small amount of
> duplication, I can explain as application bug, or duplication comes from
> the replication. But more than 10% is too much to explain this way.
>

Have you run "repair"? Do you regularly have hinted handoff kicking in due
to down nodes or dropped messages, such that failed writes are re-delivered
as hints?

=Rob

Mime
View raw message