Do you mean, running the get_range_slices from a single? Yes, it would be reasonable for a relatively small key range, when it comes to analyze a really big range in really big data collection (i.e. like the one we currently populate) having a way for distributing the reads among the cluster seems the only reasonable solution.
In this current situation, the best option might be distributing the range among ColumnFamilies (say, 1 CF for each day) and emptying the CF in order to reuse for another day range after analyzing the data.
Can you suggest a workaround for this?
Sounds like doing this w/o m/r with get_range_slices is a reasonable way to go.
On Thu, Apr 29, 2010 at 6:04 PM, Utku Can Topçu <email@example.com> wrote:
> I'm currently writing collected data continuously to Cassandra, having keys
> starting with a timestamp and a unique identifier (like
> 2009.01.01.00.00.00.RANDOM) for being able to query in time ranges.
> I'm thinking of running periodical mapreduce jobs which will go through a
> designated time period. I might want to analyze the data only between
> 2009.01 and 2009.02.
> I had done this previously with HBase however I thought cassandra would be a
> better choice for continuously storing data in a safe manner.
> I guess this briefly explains my designated use case.
> Best Regards,
> On Thu, Apr 29, 2010 at 11:32 PM, Jonathan Ellis <firstname.lastname@example.org> wrote:
>> It's technically possible but 0.6 does not support this, no.
>> What is the use case you are thinking of?
>> On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu <email@example.com>
>> > Hi,
>> > I've been trying to use Cassandra for some kind of a supplementary input
>> > source for Hadoop MapReduce jobs.
>> > The default usage of the ColumnFamilyInputFormat does a full
>> > columnfamily
>> > scan for using within the MapReduce framework as map input.
>> > However I believe that, it should be possible to give a keyrange to scan
>> > the
>> > specified range.
>> > Is it anymeans possible?
>> > Best Regards,
>> > Utku
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support