Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFBC610F65 for ; Tue, 17 Sep 2013 13:51:11 +0000 (UTC) Received: (qmail 49373 invoked by uid 500); 17 Sep 2013 13:51:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 49313 invoked by uid 500); 17 Sep 2013 13:51:08 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 49305 invoked by uid 99); 17 Sep 2013 13:51:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2013 13:51:07 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of java8964@hotmail.com designates 65.54.51.88 as permitted sender) Received: from [65.54.51.88] (HELO snt0-omc3-s51.snt0.hotmail.com) (65.54.51.88) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2013 13:51:02 +0000 Received: from SNT149-W61 ([65.55.90.136]) by snt0-omc3-s51.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 17 Sep 2013 06:50:41 -0700 X-TMN: [urzQXVVPdWe7wpGvHY3i4qIxANsamB+FpgmCUWiguzw=] X-Originating-Email: [java8964@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_725f04c5-2f5c-4004-b581-913b3944648e_" From: java8964 java8964 To: "user@cassandra.apache.org" Subject: RE: questions related to the SSTable file Date: Tue, 17 Sep 2013 09:50:40 -0400 Importance: Normal In-Reply-To: References: , MIME-Version: 1.0 X-OriginalArrivalTime: 17 Sep 2013 13:50:41.0235 (UTC) FILETIME=[E5DA2A30:01CEB3AC] X-Virus-Checked: Checked by ClamAV on apache.org --_725f04c5-2f5c-4004-b581-913b3944648e_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Thanks Dean for clarification. But if I put hundreds of megabyte data of one row through one put=2C what y= ou mean is Cassandra will put all of them into one SSTable=2C even the data= is very big=2C right? Let's assume in this case the Memtables in memory re= aches its limit by this change.What I want to know is if there is possibili= ty 2 SSTables be generated in above case=2C what is the boundary. I understand if following changes apply to the same row key as above exampl= e=2C additional SSTable file could be generated. That is clear for me. Yong > From: Dean.Hiller@nrel.gov > To: user@cassandra.apache.org > Date: Tue=2C 17 Sep 2013 07:39:48 -0600 > Subject: Re: questions related to the SSTable file >=20 > You have to first understand the rules of >=20 > 1. Sstables are immutable so Color-1-Data.db will not be modified and o= nly deleted once compacted > 2. Memtables are flushed when reaching a limit so if Blue:{hex} is modi= fied=2C it is done in the in-memory memtable that is eventually flushed > 3. Once flushed=2C it is an SSTable on disk and you have two values for= "hex" both with two timestamps so we know which one is the current value >=20 > When it finally compacts=2C the old value can go away. >=20 > Dean >=20 > From: java8964 java8964 > > Reply-To: "user@cassandra.apache.org" <= user@cassandra.apache.org> > Date: Tuesday=2C September 17=2C 2013 7:32 AM > To: "user@cassandra.apache.org" > > Subject: RE: questions related to the SSTable file >=20 > Hi=2C Takenori: >=20 > Thanks for your quick reply. Your explain is clear for me understanding w= hat compaction mean=2C and I also can understand now same row key will exis= t in multi SSTable file. >=20 > But beyond that=2C I want to know what happen if one row data is too larg= e to put in one SSTable file. In your example=2C the same row exist in mult= i SSTable files as it is keeping changing and flushing into the disk at run= time. That's fine=2C in this case=2C in every SSTable file of the 4=2C ther= e is no single file contains whole data of that row=2C but each one does co= ntain full picture of individual unit ( I don't know what I should call thi= s unit=2C but it will be larger than one column=2C right?). Just in your ex= ample=2C there is no way in any time=2C we could have SSTable files like fo= llowing=2C right: >=20 > - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}=2C {Blue: {hex: #0000}}] > - Color-1-Data_1.db: [{Blue: {hex:FF}}] > - Color-2-Data.db: [{Green: {hex: #008000}}=2C {Blue: {hex2: #2c86ff}}] > - Color-3-Data.db: [{Aqua: {hex: #00FFFF}}=2C {Green: {hex2: #32CD32}}=2C= {Blue: {}}] > - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}=2C {Gold: {hex: #FFD700}}] >=20 > I don't see any reason Cassandra will ever do that=2C but just want to co= nfirm=2C as your 'no' answer to my 2 question is confusion. >=20 > Another question from my originally email=2C even though I may get the an= swer already from your example=2C but just want to confirm it. > Just use your example=2C let's say after the first 2 steps: >=20 > - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}=2C {Blue: {hex: #0000FF}}] > - Color-2-Data.db: [{Green: {hex: #008000}}=2C {Blue: {hex2: #2c86ff}}] > There is a incremental backup. After that=2C there is following changes c= oming: >=20 > - Add a column of (key=2C column=2C column_value =3D Green=2C hex2=2C #32= CD32) > - Add a row of (key=2C column=2C column_value =3D Aqua=2C hex=2C #00FFFF) > - Delete a row of (key =3D Blue) > ---- memtable is flushed =3D> Color-3-Data.db ---- > Another incremental backup right now. >=20 > Now in this case=2C my assumption is only Color-3-Data.db will be in this= backup=2C right? Even though Color-1-Data.db and Color-2-Data.db contains = the data of the same row key as Color-3-Data.db=2C but from a incremental b= ackup point of view=2C only Color-3-Data.db will be stored. >=20 > The reason I asked those question is that I am thinking to use MapReduce = jobs to parse the incremental backup files=2C and rebuild the snapshot in H= adoop side. Of course=2C the column families I am doing is pure Fact data. = So there is delete/update in Cassandra for these kind of data=2C just appen= ding. But it is still important for me to understand the SSTable file's con= tent. >=20 > Thanks >=20 > Yong >=20 >=20 > ________________________________ > Date: Tue=2C 17 Sep 2013 11:12:01 +0900 > From: tsato@cloudian.com > To: user@cassandra.apache.org > Subject: Re: questions related to the SSTable file >=20 > Hi=2C >=20 > > 1) I will expect same row key could show up in both sstable2json output= =2C as this one row exists in both SSTable files=2C right? >=20 > Yes. >=20 > > 2) If so=2C what is the boundary? Will Cassandra guarantee the column l= evel as the boundary? What I mean is that for one column's data=2C it will = be guaranteed to be either in the first file=2C or 2nd file=2C right? There= is no chance that Cassandra will cut the data of one column into 2 part=2C= and one part stored in first SSTable file=2C and the other part stored in = second SSTable file. Is my understanding correct? >=20 > No. >=20 > > 3) If what we are talking about are only the SSTable files in snapshot= =2C incremental backup SSTable files=2C exclude the runtime SSTable files= =2C will anything change? For snapshot or incremental backup SSTable files= =2C first can one row data still may exist in more than one SSTable file? A= nd any boundary change in this case? > > 4) If I want to use incremental backup SSTable files as the way to catc= h data being changed=2C is it a good way to do what I try to archive? In th= is case=2C what happen in the following example: >=20 > I don't fully understand=2C but snapshot will do. It will create hard lin= ks to all the SSTable files present at snapshot. >=20 >=20 > Let me explain how SSTable and compaction works. >=20 > Suppose we have 4 files being compacted(the last one has bee just flushed= =2C then which triggered compaction). Note that file names are simplified. >=20 > - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}=2C {Blue: {hex: #0000FF}}] > - Color-2-Data.db: [{Green: {hex: #008000}}=2C {Blue: {hex2: #2c86ff}}] > - Color-3-Data.db: [{Aqua: {hex: #00FFFF}}=2C {Green: {hex2: #32CD32}}=2C= {Blue: {}}] > - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}=2C {Gold: {hex: #FFD700}}] >=20 > They are created by the following operations. >=20 > - Add a row of (key=2C column=2C column_value =3D Blue=2C hex=2C #0000FF) > - Add a row of (key=2C column=2C column_value =3D Lavender=2C hex=2C #E6E= 6FA) > ---- memtable is flushed =3D> Color-1-Data.db ---- > - Add a row of (key=2C column=2C column_value =3D Green=2C hex=2C #008000= ) > - Add a column of (key=2C column=2C column_value =3D Blue=2C hex2=2C #2c8= 6ff) > ---- memtable is flushed =3D> Color-2-Data.db ---- > - Add a column of (key=2C column=2C column_value =3D Green=2C hex2=2C #32= CD32) > - Add a row of (key=2C column=2C column_value =3D Aqua=2C hex=2C #00FFFF) > - Delete a row of (key =3D Blue) > ---- memtable is flushed =3D> Color-3-Data.db ---- > - Add a row of (key=2C column=2C column_value =3D Magenta=2C hex=2C #FF00= FF) > - Add a row of (key=2C column=2C column_value =3D Gold=2C hex=2C #FFD700) > ---- memtable is flushed =3D> Color-4-Data.db ---- >=20 > Then=2C a compaction will merge all those fragments together into the lat= est ones as follows. >=20 > - Color-5-Data.db: [{Lavender: {hex: #E6E6FA}=2C {Aqua: {hex: #00FFFF}=2C= {Green: {hex: #008000=2C hex2: #32CD32}}=2C {Magenta: {hex: #FF00FF}}=2C {= Gold: {hex: #FFD700}}] > * assuming RandomPartitioner is used >=20 > Hope they would help. >=20 > - Takenori >=20 > (2013/09/17 10:51)=2C java8964 java8964 wrote: > Hi=2C I have some questions related to the SSTable in the Cassandra=2C as= I am doing a project to use it and hope someone in this list can share som= e thoughts. >=20 > My understand is the SSTable is per column family. But each column family= could have multi SSTable files. During the runtime=2C one row COULD split = into more than one SSTable file=2C even this is not good for performance=2C= but it does happen=2C and Cassandra will try to merge and store one row da= ta into one SSTable file during compassion. >=20 > The question is when one row is split in multi SSTable files=2C what is t= he boundary? Or let me ask this way=2C if one row exists in 2 SSTable files= =2C if I run sstable2json tool to run on both SSTable files individually: >=20 > 1) I will expect same row key could show up in both sstable2json output= =2C as this one row exists in both SSTable files=2C right? > 2) If so=2C what is the boundary? Will Cassandra guarantee the column lev= el as the boundary? What I mean is that for one column's data=2C it will be= guaranteed to be either in the first file=2C or 2nd file=2C right? There i= s no chance that Cassandra will cut the data of one column into 2 part=2C a= nd one part stored in first SSTable file=2C and the other part stored in se= cond SSTable file. Is my understanding correct? > 3) If what we are talking about are only the SSTable files in snapshot=2C= incremental backup SSTable files=2C exclude the runtime SSTable files=2C w= ill anything change? For snapshot or incremental backup SSTable files=2C fi= rst can one row data still may exist in more than one SSTable file? And any= boundary change in this case? > 4) If I want to use incremental backup SSTable files as the way to catch = data being changed=2C is it a good way to do what I try to archive? In this= case=2C what happen in the following example: >=20 > For column family A: > at Time 0=2C one row key (key1) has some data. It is being stored and bac= k up in SSTable file 1. > at Time 1=2C if any column for key1 has any change (a new column insert= =2C a column updated/deleted=2C or even whole row being deleted)=2C I will = expect this whole row exists in the any incremental backup SSTable files af= ter time 1=2C right? >=20 > What happen if the above row just happen to store in more than one SSTabl= e file? > at Time 0=2C one row key (key1) has some data=2C and it just is stored in= SSTable file1 and file2=2C and being backup. > at Time 1=2C if one column is added in row key1=2C and the change in fact= will happen in SSTable file2 only in this case=2C and if we do a increment= al backup after that=2C what SSTable files should I expect in this backup? = Both SSTable files? Or Just SSTable file 2? >=20 > I was thinking incremental backup SSTable files are good candidate for ca= tching data being changed=2C but as one row data could exist in multi SSTab= le file makes thing complex now. Did anyone have any experience to use SSTa= ble file in this way? What are the lessons? >=20 > Thanks >=20 > Yong >=20 = --_725f04c5-2f5c-4004-b581-913b3944648e_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Thanks Dean for clarification.
But if I put hundreds of megabyte data of one row through= one put=2C what you mean is Cassandra will put all of them into one SSTabl= e=2C even the data is very big=2C right? Let's assume in this case the Memt= ables in memory reaches its limit by this change.
What I want to = know is if there is possibility 2 SSTables be generated in above case=2C wh= at is the boundary.

I understand if following chan= ges apply to the same row key as above example=2C additional SSTable file c= ould be generated. That is clear for me.

Yong
<= br>
>=3B From: Dean.Hiller@nrel.gov
>=3B To: user@cassandra.apac= he.org
>=3B Date: Tue=2C 17 Sep 2013 07:39:48 -0600
>=3B Subject:= Re: questions related to the SSTable file
>=3B
>=3B You have to= first understand the rules of
>=3B
>=3B 1. Sstables are immut= able so Color-1-Data.db will not be modified and only deleted once compacte= d
>=3B 2. Memtables are flushed when reaching a limit so if Blue:{he= x} is modified=2C it is done in the in-memory memtable that is eventually f= lushed
>=3B 3. Once flushed=2C it is an SSTable on disk and you have= two values for "hex" both with two timestamps so we know which one is the = current value
>=3B
>=3B When it finally compacts=2C the old valu= e can go away.
>=3B
>=3B Dean
>=3B
>=3B From: java896= 4 java8964 <=3Bjava8964@hotmail.com<=3Bmailto:java8964@hotmail.com>= =3B>=3B
>=3B Reply-To: "user@cassandra.apache.org<=3Bmailto:user@c= assandra.apache.org>=3B" <=3Buser@cassandra.apache.org<=3Bmailto:user= @cassandra.apache.org>=3B>=3B
>=3B Date: Tuesday=2C September 17= =2C 2013 7:32 AM
>=3B To: "user@cassandra.apache.org<=3Bmailto:user@= cassandra.apache.org>=3B" <=3Buser@cassandra.apache.org<=3Bmailto:use= r@cassandra.apache.org>=3B>=3B
>=3B Subject: RE: questions related= to the SSTable file
>=3B
>=3B Hi=2C Takenori:
>=3B
>= =3B Thanks for your quick reply. Your explain is clear for me understanding= what compaction mean=2C and I also can understand now same row key will ex= ist in multi SSTable file.
>=3B
>=3B But beyond that=2C I want t= o know what happen if one row data is too large to put in one SSTable file.= In your example=2C the same row exist in multi SSTable files as it is keep= ing changing and flushing into the disk at runtime. That's fine=2C in this = case=2C in every SSTable file of the 4=2C there is no single file contains = whole data of that row=2C but each one does contain full picture of individ= ual unit ( I don't know what I should call this unit=2C but it will be larg= er than one column=2C right?). Just in your example=2C there is no way in a= ny time=2C we could have SSTable files like following=2C right:
>=3B <= br>>=3B - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}=2C {Blue: {hex: #0= 000}}]
>=3B - Color-1-Data_1.db: [{Blue: {hex:FF}}]
>=3B - Color= -2-Data.db: [{Green: {hex: #008000}}=2C {Blue: {hex2: #2c86ff}}]
>=3B = - Color-3-Data.db: [{Aqua: {hex: #00FFFF}}=2C {Green: {hex2: #32CD32}}=2C {= Blue: {}}]
>=3B - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}=2C {Gold= : {hex: #FFD700}}]
>=3B
>=3B I don't see any reason Cassandra wi= ll ever do that=2C but just want to confirm=2C as your 'no' answer to my 2 = question is confusion.
>=3B
>=3B Another question from my origin= ally email=2C even though I may get the answer already from your example=2C= but just want to confirm it.
>=3B Just use your example=2C let's say = after the first 2 steps:
>=3B
>=3B - Color-1-Data.db: [{Lavender= : {hex: #E6E6FA}}=2C {Blue: {hex: #0000FF}}]
>=3B - Color-2-Data.db: [= {Green: {hex: #008000}}=2C {Blue: {hex2: #2c86ff}}]
>=3B There is a in= cremental backup. After that=2C there is following changes coming:
>= =3B
>=3B - Add a column of (key=2C column=2C column_value =3D Green= =2C hex2=2C #32CD32)
>=3B - Add a row of (key=2C column=2C column_valu= e =3D Aqua=2C hex=2C #00FFFF)
>=3B - Delete a row of (key =3D Blue)>=3B ---- memtable is flushed =3D>=3B Color-3-Data.db ----
>=3B A= nother incremental backup right now.
>=3B
>=3B Now in this case= =2C my assumption is only Color-3-Data.db will be in this backup=2C right? = Even though Color-1-Data.db and Color-2-Data.db contains the data of the sa= me row key as Color-3-Data.db=2C but from a incremental backup point of vie= w=2C only Color-3-Data.db will be stored.
>=3B
>=3B The reason I= asked those question is that I am thinking to use MapReduce jobs to parse = the incremental backup files=2C and rebuild the snapshot in Hadoop side. Of= course=2C the column families I am doing is pure Fact data. So there is de= lete/update in Cassandra for these kind of data=2C just appending. But it i= s still important for me to understand the SSTable file's content.
>= =3B
>=3B Thanks
>=3B
>=3B Yong
>=3B
>=3B
&g= t=3B ________________________________
>=3B Date: Tue=2C 17 Sep 2013 11= :12:01 +0900
>=3B From: tsato@cloudian.com<=3Bmailto:tsato@cloudian.= com>=3B
>=3B To: user@cassandra.apache.org<=3Bmailto:user@cassandr= a.apache.org>=3B
>=3B Subject: Re: questions related to the SSTable = file
>=3B
>=3B Hi=2C
>=3B
>=3B >=3B 1) I will expec= t same row key could show up in both sstable2json output=2C as this one row= exists in both SSTable files=2C right?
>=3B
>=3B Yes.
>=3B=
>=3B >=3B 2) If so=2C what is the boundary? Will Cassandra guarant= ee the column level as the boundary? What I mean is that for one column's d= ata=2C it will be guaranteed to be either in the first file=2C or 2nd file= =2C right? There is no chance that Cassandra will cut the data of one colum= n into 2 part=2C and one part stored in first SSTable file=2C and the other= part stored in second SSTable file. Is my understanding correct?
>=3B=
>=3B No.
>=3B
>=3B >=3B 3) If what we are talking about= are only the SSTable files in snapshot=2C incremental backup SSTable files= =2C exclude the runtime SSTable files=2C will anything change? For snapshot= or incremental backup SSTable files=2C first can one row data still may ex= ist in more than one SSTable file? And any boundary change in this case?>=3B >=3B 4) If I want to use incremental backup SSTable files as the = way to catch data being changed=2C is it a good way to do what I try to arc= hive? In this case=2C what happen in the following example:
>=3B
&= gt=3B I don't fully understand=2C but snapshot will do. It will create hard= links to all the SSTable files present at snapshot.
>=3B
>=3B <= br>>=3B Let me explain how SSTable and compaction works.
>=3B
&g= t=3B Suppose we have 4 files being compacted(the last one has bee just flus= hed=2C then which triggered compaction). Note that file names are simplifie= d.
>=3B
>=3B - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}=2C {= Blue: {hex: #0000FF}}]
>=3B - Color-2-Data.db: [{Green: {hex: #008000}= }=2C {Blue: {hex2: #2c86ff}}]
>=3B - Color-3-Data.db: [{Aqua: {hex: #0= 0FFFF}}=2C {Green: {hex2: #32CD32}}=2C {Blue: {}}]
>=3B - Color-4-Data= .db: [{Magenta: {hex: #FF00FF}}=2C {Gold: {hex: #FFD700}}]
>=3B
&g= t=3B They are created by the following operations.
>=3B
>=3B - A= dd a row of (key=2C column=2C column_value =3D Blue=2C hex=2C #0000FF)
&= gt=3B - Add a row of (key=2C column=2C column_value =3D Lavender=2C hex=2C = #E6E6FA)
>=3B ---- memtable is flushed =3D>=3B Color-1-Data.db ----<= br>>=3B - Add a row of (key=2C column=2C column_value =3D Green=2C hex=2C= #008000)
>=3B - Add a column of (key=2C column=2C column_value =3D Bl= ue=2C hex2=2C #2c86ff)
>=3B ---- memtable is flushed =3D>=3B Color-2= -Data.db ----
>=3B - Add a column of (key=2C column=2C column_value = =3D Green=2C hex2=2C #32CD32)
>=3B - Add a row of (key=2C column=2C co= lumn_value =3D Aqua=2C hex=2C #00FFFF)
>=3B - Delete a row of (key =3D= Blue)
>=3B ---- memtable is flushed =3D>=3B Color-3-Data.db ---->=3B - Add a row of (key=2C column=2C column_value =3D Magenta=2C hex=2C= #FF00FF)
>=3B - Add a row of (key=2C column=2C column_value =3D Gold= =2C hex=2C #FFD700)
>=3B ---- memtable is flushed =3D>=3B Color-4-Da= ta.db ----
>=3B
>=3B Then=2C a compaction will merge all those f= ragments together into the latest ones as follows.
>=3B
>=3B - C= olor-5-Data.db: [{Lavender: {hex: #E6E6FA}=2C {Aqua: {hex: #00FFFF}=2C {Gre= en: {hex: #008000=2C hex2: #32CD32}}=2C {Magenta: {hex: #FF00FF}}=2C {Gold:= {hex: #FFD700}}]
>=3B * assuming RandomPartitioner is used
>=3B =
>=3B Hope they would help.
>=3B
>=3B - Takenori
>=3B =
>=3B (2013/09/17 10:51)=2C java8964 java8964 wrote:
>=3B Hi=2C I= have some questions related to the SSTable in the Cassandra=2C as I am doi= ng a project to use it and hope someone in this list can share some thought= s.
>=3B
>=3B My understand is the SSTable is per column family. = But each column family could have multi SSTable files. During the runtime= =2C one row COULD split into more than one SSTable file=2C even this is not= good for performance=2C but it does happen=2C and Cassandra will try to me= rge and store one row data into one SSTable file during compassion.
>= =3B
>=3B The question is when one row is split in multi SSTable files= =2C what is the boundary? Or let me ask this way=2C if one row exists in 2 = SSTable files=2C if I run sstable2json tool to run on both SSTable files in= dividually:
>=3B
>=3B 1) I will expect same row key could show u= p in both sstable2json output=2C as this one row exists in both SSTable fil= es=2C right?
>=3B 2) If so=2C what is the boundary? Will Cassandra gua= rantee the column level as the boundary? What I mean is that for one column= 's data=2C it will be guaranteed to be either in the first file=2C or 2nd f= ile=2C right? There is no chance that Cassandra will cut the data of one co= lumn into 2 part=2C and one part stored in first SSTable file=2C and the ot= her part stored in second SSTable file. Is my understanding correct?
>= =3B 3) If what we are talking about are only the SSTable files in snapshot= =2C incremental backup SSTable files=2C exclude the runtime SSTable files= =2C will anything change? For snapshot or incremental backup SSTable files= =2C first can one row data still may exist in more than one SSTable file? A= nd any boundary change in this case?
>=3B 4) If I want to use incremen= tal backup SSTable files as the way to catch data being changed=2C is it a = good way to do what I try to archive? In this case=2C what happen in the fo= llowing example:
>=3B
>=3B For column family A:
>=3B at Tim= e 0=2C one row key (key1) has some data. It is being stored and back up in = SSTable file 1.
>=3B at Time 1=2C if any column for key1 has any chang= e (a new column insert=2C a column updated/deleted=2C or even whole row bei= ng deleted)=2C I will expect this whole row exists in the any incremental b= ackup SSTable files after time 1=2C right?
>=3B
>=3B What happen= if the above row just happen to store in more than one SSTable file?
&g= t=3B at Time 0=2C one row key (key1) has some data=2C and it just is stored= in SSTable file1 and file2=2C and being backup.
>=3B at Time 1=2C if = one column is added in row key1=2C and the change in fact will happen in SS= Table file2 only in this case=2C and if we do a incremental backup after th= at=2C what SSTable files should I expect in this backup? Both SSTable files= ? Or Just SSTable file 2?
>=3B
>=3B I was thinking incremental b= ackup SSTable files are good candidate for catching data being changed=2C b= ut as one row data could exist in multi SSTable file makes thing complex no= w. Did anyone have any experience to use SSTable file in this way? What are= the lessons?
>=3B
>=3B Thanks
>=3B
>=3B Yong
>= =3B
= --_725f04c5-2f5c-4004-b581-913b3944648e_--