incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject RE: questions related to the SSTable file
Date Tue, 17 Sep 2013 18:37:19 GMT
Another question related to the SSTable files generated in the incremental backup is not really
ONLY incremental delta, right? It will include more than delta in the SSTable files.
I will use the example to show my question:
first, we have this data in the SSTable file 1:
rowkey(1), columns (maker=honda).
later, if we add one column in the same key:
rowkey(1), columns (maker=honda, color=blue)
The data above being flushed to another SSTable file 2. In this case, it will be part of the
incremental backup at this time. But in fact, it will contain both old data (make=honda),
plus new changes (color=blue).
So in fact, incremental backup of Cassandra is just hard link all the new SSTable files being
generated during the incremental backup period. It could contain any data, not just the data
being update/insert/delete in this period, correct?
Thanks
Yong

> From: Dean.Hiller@nrel.gov
> To: user@cassandra.apache.org
> Date: Tue, 17 Sep 2013 08:11:36 -0600
> Subject: Re: questions related to the SSTable file
> 
> Netflix created file streaming in astyanax into cassandra specifically because writing
too big a column cell is a bad thing.  The limit is really dependent on use case….do you
have servers writing 1000's of 200Meg files at the same time….if so, astyanax streaming
may be a better way to go there where it divides up the file amongst cells and rows.
> 
> I know the limit of a row size is really your hard disk space and the column count if
I remember goes into billions though realistically, I think beyond 10 million might slow down
a bit….all I know is we tested up to 10 million columns with no issues in our use-case.
> 
> So you mean at this time, I could get 2 SSTable files, both contain column "Blue" for
the same row key, right?
> 
> Yes
> 
> In this case, I should be fine as value of the "Blue" column contain the timestamp to
help me to find out which is the last change, right?
> 
> Yes
> 
> In MR world, each file COULD be processed by different Mapper, but will be sent to the
same reducer as both data will be shared same key.
> 
> If that is the way you are writing it, then yes
> 
> Dean
> 
> From: Shahab Yunus <shahab.yunus@gmail.com<mailto:shahab.yunus@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Tuesday, September 17, 2013 7:54 AM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Re: questions related to the SSTable file
> 
> derstand if following changes apply to the same row key as above example, additional
SSTable file could be generated. That is
 		 	   		  
Mime
View raw message