incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: questions related to the SSTable file
Date Tue, 17 Sep 2013 21:42:28 GMT
Thanks Robert for the answer. It makes sense. If that happens then it means
that your design or use case needs some rework ;)

Regards,
Shahab


On Tue, Sep 17, 2013 at 2:37 PM, java8964 java8964 <java8964@hotmail.com>wrote:

> Another question related to the SSTable files generated in the incremental
> backup is not really ONLY incremental delta, right? It will include more
> than delta in the SSTable files.
>
> I will use the example to show my question:
>
> first, we have this data in the SSTable file 1:
>
> rowkey(1), columns (maker=honda).
>
> later, if we add one column in the same key:
>
> rowkey(1), columns (maker=honda, color=blue)
>
> The data above being flushed to another SSTable file 2. In this case, it
> will be part of the incremental backup at this time. But in fact, it will
> contain both old data (make=honda), plus new changes (color=blue).
>
> So in fact, incremental backup of Cassandra is just hard link all the new
> SSTable files being generated during the incremental backup period. It
> could contain any data, not just the data being update/insert/delete in
> this period, correct?
>
> Thanks
>
> Yong
>
> > From: Dean.Hiller@nrel.gov
> > To: user@cassandra.apache.org
> > Date: Tue, 17 Sep 2013 08:11:36 -0600
> > Subject: Re: questions related to the SSTable file
> >
> > Netflix created file streaming in astyanax into cassandra specifically
> because writing too big a column cell is a bad thing. The limit is really
> dependent on use case….do you have servers writing 1000's of 200Meg files
> at the same time….if so, astyanax streaming may be a better way to go there
> where it divides up the file amongst cells and rows.
> >
> > I know the limit of a row size is really your hard disk space and the
> column count if I remember goes into billions though realistically, I think
> beyond 10 million might slow down a bit….all I know is we tested up to 10
> million columns with no issues in our use-case.
> >
> > So you mean at this time, I could get 2 SSTable files, both contain
> column "Blue" for the same row key, right?
> >
> > Yes
> >
> > In this case, I should be fine as value of the "Blue" column contain the
> timestamp to help me to find out which is the last change, right?
> >
> > Yes
> >
> > In MR world, each file COULD be processed by different Mapper, but will
> be sent to the same reducer as both data will be shared same key.
> >
> > If that is the way you are writing it, then yes
> >
> > Dean
> >
> > From: Shahab Yunus <shahab.yunus@gmail.com<mailto:shahab.yunus@gmail.com
> >>
> > Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> > Date: Tuesday, September 17, 2013 7:54 AM
> > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> > Subject: Re: questions related to the SSTable file
> >
> > derstand if following changes apply to the same row key as above
> example, additional SSTable file could be generated. That is
>

Mime
View raw message