Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B0F5A10BDC for ; Wed, 23 Oct 2013 00:17:37 +0000 (UTC) Received: (qmail 11289 invoked by uid 500); 23 Oct 2013 00:17:35 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11268 invoked by uid 500); 23 Oct 2013 00:17:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11260 invoked by uid 99); 23 Oct 2013 00:17:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Oct 2013 00:17:35 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of java8964@hotmail.com designates 65.54.51.82 as permitted sender) Received: from [65.54.51.82] (HELO snt0-omc3-s45.snt0.hotmail.com) (65.54.51.82) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Oct 2013 00:17:27 +0000 Received: from SNT149-W23 ([65.55.90.137]) by snt0-omc3-s45.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 22 Oct 2013 17:17:07 -0700 X-TMN: [VDYyx1ianLdI1gjQj72WNBBXCfPeeLrH] X-Originating-Email: [java8964@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_ede75524-f086-4a61-adab-a71363203788_" From: java8964 java8964 To: "user@cassandra.apache.org" Subject: RE: Questions related to the data in SSTable files Date: Tue, 22 Oct 2013 20:17:06 -0400 Importance: Normal In-Reply-To: References: , MIME-Version: 1.0 X-OriginalArrivalTime: 23 Oct 2013 00:17:07.0258 (UTC) FILETIME=[355361A0:01CECF85] X-Virus-Checked: Checked by ClamAV on apache.org --_ede75524-f086-4a61-adab-a71363203788_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Any way I can verify how often the system being "repaired"? I can ask anoth= er group who maintain the Cassandra cluster. But do you mean that even the = failed writes will be stored in the SSTable files?=20 I thought the Cassandra will use different storage to store that kind of da= ta=2C as the regular good data in memtable=2C then in the SSTable files. Yong Date: Tue=2C 22 Oct 2013 14:50:07 -0700 Subject: Re: Questions related to the data in SSTable files From: rcoli@eventbrite.com To: user@cassandra.apache.org On Tue=2C Oct 22=2C 2013 at 2:29 PM=2C java8964 java8964 wrote: =0A= =0A= =0A= =0A= 1) In the data of full snapshot=2C I see more than 10% of duplication data.= What I mean duplication is that there are event_activities with the same (= entity_1_id=2C entity_2_id=2C entity_3_id=2C entity_4_id=2C created_on_time= stamp=2C column_timestamp). I am surprised to see the high level duplicatio= n data=2C especially even adding with the column_timestamp. As my understan= ding=2C the column_timestamp is provided from the client when Cassandra sto= re the column in the row key data. So if there are some small amount of dup= lication=2C I can explain as application bug=2C or duplication comes from t= he replication. But more than 10% is too much to explain this way.=0A= Have you run "repair"? Do you regularly have hinted handoff kicking in due = to down nodes or dropped messages=2C such that failed writes are re-deliver= ed as hints?=0A= =3DRob = --_ede75524-f086-4a61-adab-a71363203788_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Any way I can verify how often t= he system being "repaired"? I can ask another group who maintain the Cassan= dra cluster. But do you mean that even the failed writes will be stored in = the SSTable files? =3B

I thought the Cassandra will = use different storage to store that kind of data=2C as the regular good dat= a in memtable=2C then in the SSTable files.

Yong

Date: Tue=2C 22 Oct 2013 14:50:07 -0700<= br>Subject: Re: Questions related to the data in SSTable files
From: rco= li@eventbrite.com
To: user@cassandra.apache.org

= On Tue=2C Oct 22=2C 2013 at 2:29 PM=2C java8964 java8964 = <=3Bjava8964@ho= tmail.com>=3B wrote:
=0A=
=0A= =0A= =0A=
1) In the data of full snapshot=2C I see more th= an 10% of duplication data. What I mean duplication is that there are event= _activities with the same (entity_1_id=2C entity_2_id=2C entity_3_id=2C ent= ity_4_id=2C created_on_timestamp=2C column_timestamp). I am surprised to se= e the high level duplication data=2C especially even adding with the column= _timestamp. As my understanding=2C the column_timestamp is provided from th= e client when Cassandra store the column in the row key data. So if there a= re some small amount of duplication=2C I can explain as application bug=2C = or duplication comes from the replication. But more than 10% is too much to= explain this way.
=0A=

Have you run "repair"? Do you = regularly have hinted handoff kicking in due to down nodes or dropped messa= ges=2C such that failed writes are re-delivered as hints?
=0A=
 =3B
=3DRob

=
= --_ede75524-f086-4a61-adab-a71363203788_--