Thank you for mentioning about the expiring columns issue. I didn't know that it had existed.
That's really great news.
First of all, does the current 0.6 branch support it? If not so, is the patch available for the 0.6.5 somehow?
And about the deletion issue, if all the columns in a row expire? When the row will be deleted, will I be seeing the row in my map inputs somehow, and for how long?
A simpler approach might be to insert expiring columns into a 2nd CF
with a TTL of one hour.
On Mon, Oct 4, 2010 at 5:12 AM, Utku Can Topçu <email@example.com> wrote:
> Hey All,
> I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are
> formed in such a fashion that, they are indexed in descending order by time.
> So I'll be analyzing the data for every hour iteratively.
> Since the current Hadoop integration does not support partial columnfamily
> analysis. I feel that, I'll need to dump the data of the last hour and put
> it to the hadoop cluster and do my analysis on the flat text file.
> Do you think of any other "better" way of getting the data of a keyrange
> into a hadoop cluster for analysis?
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support