cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Coli <>
Subject Re: Questions related to the data in SSTable files
Date Tue, 22 Oct 2013 21:50:07 GMT
On Tue, Oct 22, 2013 at 2:29 PM, java8964 java8964 <>wrote:

> 1) In the data of full snapshot, I see more than 10% of duplication data.
> What I mean duplication is that there are event_activities with the same
> (entity_1_id, entity_2_id, entity_3_id, entity_4_id, created_on_timestamp,
> column_timestamp). I am surprised to see the high level duplication data,
> especially even adding with the column_timestamp. As my understanding, the
> column_timestamp is provided from the client when Cassandra store the
> column in the row key data. So if there are some small amount of
> duplication, I can explain as application bug, or duplication comes from
> the replication. But more than 10% is too much to explain this way.

Have you run "repair"? Do you regularly have hinted handoff kicking in due
to down nodes or dropped messages, such that failed writes are re-delivered
as hints?


View raw message