Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D926310E7B for ; Wed, 18 Sep 2013 00:47:21 +0000 (UTC) Received: (qmail 15275 invoked by uid 500); 18 Sep 2013 00:47:19 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 15245 invoked by uid 500); 18 Sep 2013 00:47:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 15237 invoked by uid 99); 18 Sep 2013 00:47:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Sep 2013 00:47:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tsato@cloudian.com designates 209.85.223.177 as permitted sender) Received: from [209.85.223.177] (HELO mail-ie0-f177.google.com) (209.85.223.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Sep 2013 00:47:14 +0000 Received: by mail-ie0-f177.google.com with SMTP id qd12so10890089ieb.8 for ; Tue, 17 Sep 2013 17:46:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=o2oEwJP2D8Tnmd00hnrvnqeyLzlPy+pFVnlqfgxOnoc=; b=FwaFBiqhg4+Dqn+qJ3GnAQdr5Jk2DohUBgcg2kcMaPzYT9gvsvqqzMwGhu1iZWYAGq rmnRuwWNREW3gdlzb5eQdSeFjAJBXJzYmodb5SNVFTF/P+KqaCIyWt6sq+9mGMLHLcqL Zj0NbBxaDuBBZIE7y8sCrwQX11Zlui0HH19N9loL6OdQypS7Y2QRq02u184CCPmFjKJa OTzI77OFWznMWmPzpqE83STpj5TYaDq4TyErbrvr80uFG/HyBT6RWqChYYMES/kaK5uJ 0Zv5YebV/1lIyJmtiYpBnm0qepU+SD9HAYEyJu5u1bSFeXtPIMT8pNZGH08uGNkd8LL1 ynIg== X-Gm-Message-State: ALoCoQmrszIfqdCdoyFcpv0ItdmFJcD7NN/7/UnZUIGjhzumrgqg9vUcXAjoiNmq6TlnyX8LuZod MIME-Version: 1.0 X-Received: by 10.50.45.34 with SMTP id j2mr2243873igm.13.1379465213814; Tue, 17 Sep 2013 17:46:53 -0700 (PDT) Received: by 10.64.78.161 with HTTP; Tue, 17 Sep 2013 17:46:53 -0700 (PDT) In-Reply-To: References: Date: Wed, 18 Sep 2013 09:46:53 +0900 Message-ID: Subject: Re: questions related to the SSTable file From: Takenori Sato To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=089e0122f00c0cd89804e69dc69f X-Virus-Checked: Checked by ClamAV on apache.org --089e0122f00c0cd89804e69dc69f Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable > So in fact, incremental backup of Cassandra is just hard link all the new SSTable files being generated during the incremental backup period. It could contain any data, not just the data being update/insert/delete in this period, correct? Correct. But over time, some old enough SSTable files are usually shared across multiple snapshots. On Wed, Sep 18, 2013 at 3:37 AM, java8964 java8964 wr= ote: > Another question related to the SSTable files generated in the incrementa= l > backup is not really ONLY incremental delta, right? It will include more > than delta in the SSTable files. > > I will use the example to show my question: > > first, we have this data in the SSTable file 1: > > rowkey(1), columns (maker=3Dhonda). > > later, if we add one column in the same key: > > rowkey(1), columns (maker=3Dhonda, color=3Dblue) > > The data above being flushed to another SSTable file 2. In this case, it > will be part of the incremental backup at this time. But in fact, it will > contain both old data (make=3Dhonda), plus new changes (color=3Dblue). > > So in fact, incremental backup of Cassandra is just hard link all the new > SSTable files being generated during the incremental backup period. It > could contain any data, not just the data being update/insert/delete in > this period, correct? > > Thanks > > Yong > > > From: Dean.Hiller@nrel.gov > > To: user@cassandra.apache.org > > Date: Tue, 17 Sep 2013 08:11:36 -0600 > > > Subject: Re: questions related to the SSTable file > > > > Netflix created file streaming in astyanax into cassandra specifically > because writing too big a column cell is a bad thing. The limit is really > dependent on use case=85.do you have servers writing 1000's of 200Meg fil= es > at the same time=85.if so, astyanax streaming may be a better way to go t= here > where it divides up the file amongst cells and rows. > > > > I know the limit of a row size is really your hard disk space and the > column count if I remember goes into billions though realistically, I thi= nk > beyond 10 million might slow down a bit=85.all I know is we tested up to = 10 > million columns with no issues in our use-case. > > > > So you mean at this time, I could get 2 SSTable files, both contain > column "Blue" for the same row key, right? > > > > Yes > > > > In this case, I should be fine as value of the "Blue" column contain th= e > timestamp to help me to find out which is the last change, right? > > > > Yes > > > > In MR world, each file COULD be processed by different Mapper, but will > be sent to the same reducer as both data will be shared same key. > > > > If that is the way you are writing it, then yes > > > > Dean > > > > From: Shahab Yunus >> > > Reply-To: "user@cassandra.apache.org" > > > > Date: Tuesday, September 17, 2013 7:54 AM > > To: "user@cassandra.apache.org" < > user@cassandra.apache.org> > > Subject: Re: questions related to the SSTable file > > > > derstand if following changes apply to the same row key as above > example, additional SSTable file could be generated. That is > --089e0122f00c0cd89804e69dc69f Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
>=A0So in fact, incremental backup of Cassandra is just hard link all = the new SSTable files being generated during the incremental backup period.= It could contain any data, not just the data being update/insert/delete in= this period, correct?

Correct.
But over time, some old= enough SSTable files are usually shared across multiple snapshots.=A0


On Wed= , Sep 18, 2013 at 3:37 AM, java8964 java8964 <java8964@hotmail.com= > wrote:
Another question related to the SSTable files generat= ed in the incremental backup is not really ONLY incremental delta, right? I= t will include more than delta in the SSTable files.

I will use the example to show my question:

first,= we have this data in the SSTable file 1:

rowkey(1= ), columns (maker=3Dhonda).

later, if we add one c= olumn in the same key:

rowkey(1), columns (maker=3Dhonda, color=3Dblue)
<= div>
The data above being flushed to another SSTable file 2. = In this case, it will be part of the incremental backup at this time. But i= n fact, it will contain both old data (make=3Dhonda), plus new changes (col= or=3Dblue).

So in fact, incremental backup of Cassandra is just har= d link all the new SSTable files being generated during the incremental bac= kup period. It could contain any data, not just the data being update/inser= t/delete in this period, correct?

Thanks

Yong

> F= rom: Dean.Hiller@= nrel.gov
> To: user@cassandra.apache.org
> Date: Tue, 17 Sep 2013 08:11:36 -0600

> Subjec= t: Re: questions related to the SSTable file
>
> Netflix creat= ed file streaming in astyanax into cassandra specifically because writing t= oo big a column cell is a bad thing. The limit is really dependent on use = case=85.do you have servers writing 1000's of 200Meg files at the same = time=85.if so, astyanax streaming may be a better way to go there where it = divides up the file amongst cells and rows.
>
> I know the limit of a row size is really your hard disk space= and the column count if I remember goes into billions though realistically= , I think beyond 10 million might slow down a bit=85.all I know is we teste= d up to 10 million columns with no issues in our use-case.
>
> So you mean at this time, I could get 2 SSTable files, both c= ontain column "Blue" for the same row key, right?
>
>= ; Yes
>
> In this case, I should be fine as value of the "= ;Blue" column contain the timestamp to help me to find out which is th= e last change, right?
>
> Yes
>
> In MR world, each file COULD be processe= d by different Mapper, but will be sent to the same reducer as both data wi= ll be shared same key.
>
> If that is the way you are writing = it, then yes
>
> Dean
>
> From: Shahab Yunus <shahab.yunus@gmail.com<m= ailto:shahab.yu= nus@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"= ; <user@c= assandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Tuesday, September 17, 2013 7:54 AM
> To: "user@cassandra.apache.= org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mail= to:user@cass= andra.apache.org>>
> Subject: Re: questions related to the SSTable file
>
> de= rstand if following changes apply to the same row key as above example, add= itional SSTable file could be generated. That is
=

--089e0122f00c0cd89804e69dc69f--