If you want to process millions of rows at a time take a look at the Hadoop and Pig integration. Try the Cloudera distro of Hadoop CHD3 it includes Pig with it.

Pig is a "SQL" like language for doing large scale data analysis that compiles down to Java that is run in Hadoop jobs.

There are examples in the contrib directory in the source and some information in the wiki.

I'd be interested to know how you get on, as hopefully I'll get to play with it soon.

On 29 Jul, 2010,at 01:51 PM, Ken Matsumoto <ken@nri.com> wrote:

Hi all,

Are there any better way to retrieve data from Cassandra than using

Now I'm going to port some programs using MySQL to Cassandra. The
program query is like
"select * from Table_A where date > 1/1/2008 and date < 12/31/2009 and
locationID = 1"
The result of the query will have over 1M records at a time.

In Cassandra, get_range_slices can only return 600 rows in our H/W
We have to iterate get_range_slices many times, but it takes a lot of
time in the lineary manner.

Is Cassandra not suitable for this kind of usage or not?

Best regards,


Ken Matsumoto
VP / Research & Development
Nomura Research Institute America, Inc.
NRI Pacific
1400 Fashion Island Blvd., Suite 1010
San Mateo, CA 94404, U.S.A.

PLEASE READ´╝ÜThis e-mail is confidential and intended for the named
recipient only. If you are not an intended recipient, please notify the
sender and delete this e-mail.