cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Guller <moham...@glassbeam.com>
Subject RE: select many rows one time or select many times?
Date Fri, 01 Aug 2014 22:29:07 GMT
Did you benchmark these two options:

1)      Select with IN

2)      Select all words and filter in application

Mohammed

From: Philo Yang [mailto:ud1937@gmail.com]
Sent: Thursday, July 31, 2014 10:45 AM
To: user@cassandra.apache.org
Subject: select many rows one time or select many times?

Hi all,

I have a cluster of 2.0.6 and one of my tables is like this:
CREATE TABLE word (
  user text,
  word text,
  flag double,
  PRIMARY KEY (user, word)
)

each "user" has about 10000 "word" per node. I have a requirement of selecting all rows where
user='someuser' and word is in a large set whose size is about 1000 .

In C* document, it is not recommended to use "select ... in" just like:

select from word where user='someuser' and word in ('a','b','aa','ab',...)

So now I select all rows where user='someuser' and filtrate them via client rather than via
C*. Of course, I use Datastax Java Driver to page the resultset by setFetchSize(1000).  Is
it the best way? I found the system's load is high because of large range query, should I
change to select for only one row each time and select 1000 times?

just like:
select from word where user='someuser' and word = 'a';
select from word where user='someuser' and word = 'b';
select from word where user='someuser' and word = 'c';
.....

Which method will cause lower pressure on Cassandra cluster?

Thanks,
Philo Yang

Mime
View raw message