cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: any better way to retrieve data than using get_range_slices
Date Thu, 29 Jul 2010 02:20:21 GMT
If you want to process millions of rows at a time take a look at the Hadoop and Pig integration.
Try the Cloudera distro of Hadoop CHD3 it includes Pig with it.

Pig is a "SQL" like language for doing large scale data analysis that compiles down to Java
that is run in Hadoop jobs.
http://hadoop.apache.org/pig/

There are examples in the contrib directory in the source and some information in the wiki.

I'd be interested to know how you get on, as hopefully I'll get to play with it soon.
Aaron


On 29 Jul, 2010,at 01:51 PM, Ken Matsumoto <ken@nri.com> wrote:

> Hi all,
>
> Are there any better way to retrieve data from Cassandra than using
> get_range_slices?
>
> Now I'm going to port some programs using MySQL to Cassandra. The
> program query is like
> below:
> "select * from Table_A where date > 1/1/2008 and date < 12/31/2009 and
> locationID = 1"
> The result of the query will have over 1M records at a time.
>
> In Cassandra, get_range_slices can only return 600 rows in our H/W
> condition.
> We have to iterate get_range_slices many times, but it takes a lot of
> time in the lineary manner.
>
> Is Cassandra not suitable for this kind of usage or not?
>
> Best regards,
>
> Ken.
>
> -- 
> Ken Matsumoto
> VP / Research & Development
> Nomura Research Institute America, Inc.
> NRI Pacific
> 1400 Fashion Island Blvd., Suite 1010
> San Mateo, CA 94404, U.S.A.
>
> PLEASE READ:This e-mail is confidential and intended for the named
> recipient only. If you are not an intended recipient, please notify the
> sender and delete this e-mail.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message