cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Utku Can Topçu <u...@topcu.gen.tr>
Subject Re: ColumnFamilyInputFormat KeyRange scans on a CF
Date Fri, 30 Apr 2010 14:08:32 GMT
Do you mean, running the get_range_slices from a single? Yes, it would be
reasonable for a relatively small key range, when it comes to analyze a
really big range in really big data collection (i.e. like the one we
currently populate) having a way for distributing the reads among the
cluster seems the only reasonable solution.

In this current situation, the best option might be distributing the range
among ColumnFamilies (say, 1 CF for each day) and emptying the CF in order
to reuse for another day range after analyzing the data.

Can you suggest a workaround for this?

On Fri, Apr 30, 2010 at 3:22 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> Sounds like doing this w/o m/r with get_range_slices is a reasonable way to
> go.
>
> On Thu, Apr 29, 2010 at 6:04 PM, Utku Can Topçu <utku@topcu.gen.tr> wrote:
> > I'm currently writing collected data continuously to Cassandra, having
> keys
> > starting with a timestamp and a unique identifier (like
> > 2009.01.01.00.00.00.RANDOM) for being able to query in time ranges.
> >
> > I'm thinking of running periodical mapreduce jobs which will go through a
> > designated time period. I might want to analyze the data only between
> > 2009.01 and 2009.02.
> > I had done this previously with HBase however I thought cassandra would
> be a
> > better choice for continuously storing data in a safe manner.
> >
> > I guess this briefly explains my designated use case.
> >
> > Best Regards,
> > Utku
> >
> > On Thu, Apr 29, 2010 at 11:32 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>
> >> It's technically possible but 0.6 does not support this, no.
> >>
> >> What is the use case you are thinking of?
> >>
> >> On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu <utku@topcu.gen.tr>
> >> wrote:
> >> > Hi,
> >> >
> >> > I've been trying to use Cassandra for some kind of a supplementary
> input
> >> > source for Hadoop MapReduce jobs.
> >> >
> >> > The default usage of the ColumnFamilyInputFormat does a full
> >> > columnfamily
> >> > scan for using within the MapReduce framework as map input.
> >> >
> >> > However I believe that, it should be possible to give a keyrange to
> scan
> >> > the
> >> > specified range.
> >> >
> >> > Is it anymeans possible?
> >> >
> >> > Best Regards,
> >> >
> >> > Utku
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Mime
View raw message