Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8169E117D2 for ; Fri, 1 Aug 2014 12:35:02 +0000 (UTC) Received: (qmail 59055 invoked by uid 500); 1 Aug 2014 12:35:00 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 59022 invoked by uid 500); 1 Aug 2014 12:35:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59012 invoked by uid 99); 1 Aug 2014 12:34:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 12:34:59 +0000 X-ASF-Spam-Status: No, hits=-6.5 required=5.0 tests=ENV_AND_HDR_SPF_MATCH,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,USER_IN_DEF_SPF_WL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael.laing@nytimes.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qa0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 12:34:56 +0000 Received: by mail-qa0-f44.google.com with SMTP id f12so3836705qad.31 for ; Fri, 01 Aug 2014 05:34:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=0gWTEPFB/b3q+cH51Kp/G3ZfEgHa/MPKRWMw2bZo4hg=; b=jDdW5Ot7MoiwgrR2xMQf49QJndBFOatraoP3xjF8G41e4DZmRv3p2cpcbQ+4Ynu8xR y3WDSLtoXESq4uT0J6Uo1U2c2jDoC0vE7gJq0I9ij4U2ztwPFt8hCzVxMx9E6Qel2D5/ ofQaPs++N1WOwIndZamqlaRto0u7HMEEMW4cqnztyMtycn1c+8XEFn+CEdzXhppZMjdB fUKyi7mPjMW60K0qkcd1QavxFIXa8cN9VcpaA4ZgrLsrvCMz4ThskSCSUIXVIZdSLLdU qJfQ2/nypHwfDvu0yZobxBvQV4m48TZtQTpq92VlLY1VuGpbp2KYCihw2UZm8Dfqfz/E iQZA== X-Gm-Message-State: ALoCoQnm1Qvq57KaJlgRTvXSn2Dos0g+A/YFHIyCt+K0aHIWEf48SqlXMDOy3JLwzQthlvJ5l8fC MIME-Version: 1.0 X-Received: by 10.140.25.11 with SMTP id 11mr8195417qgs.9.1406896470371; Fri, 01 Aug 2014 05:34:30 -0700 (PDT) Received: by 10.140.95.4 with HTTP; Fri, 1 Aug 2014 05:34:30 -0700 (PDT) In-Reply-To: References: Date: Fri, 1 Aug 2014 05:34:30 -0700 Message-ID: Subject: Re: select many rows one time or select many times? From: "Laing, Michael" To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c02a8e5a3d7d04ff909ca4 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c02a8e5a3d7d04ff909ca4 Content-Type: text/plain; charset=UTF-8 I don't think there is an easy "answer" to this... A possible approach, based upon the implied dimensions of the problem, would be to maintain a bloom filter over "words" for each user as a partition key with the user as clustering key. Then a single query would efficiently yield the list of users that "may" match and other techniques could be used to refine that list down to actual matches. ml On Thu, Jul 31, 2014 at 10:44 AM, Philo Yang wrote: > Hi all, > > I have a cluster of 2.0.6 and one of my tables is like this: > CREATE TABLE word ( > user text, > word text, > flag double, > PRIMARY KEY (user, word) > ) > > each "user" has about 10000 "word" per node. I have a requirement of > selecting all rows where user='someuser' and word is in a large set whose > size is about 1000 . > > In C* document, it is not recommended to use "select ... in" just like: > > select from word where user='someuser' and word in ('a','b','aa','ab',...) > > So now I select all rows where user='someuser' and filtrate them via > client rather than via C*. Of course, I use Datastax Java Driver to page > the resultset by setFetchSize(1000). Is it the best way? I found the > system's load is high because of large range query, should I change to > select for only one row each time and select 1000 times? > > just like: > select from word where user='someuser' and word = 'a'; > select from word where user='someuser' and word = 'b'; > select from word where user='someuser' and word = 'c'; > ..... > > Which method will cause lower pressure on Cassandra cluster? > > Thanks, > Philo Yang > > --001a11c02a8e5a3d7d04ff909ca4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I don't think there is an easy "answer" to t= his...

A possible approach, based upon the implied dimen= sions of the problem, would be to maintain a bloom filter over "words&= quot; for each user as a partition key with the user as clustering key. The= n a single query would efficiently yield the list of users that "may&q= uot; match and other techniques could be used to refine that list down to a= ctual matches.

ml


On Thu, Jul 31, 2014 at 10:44 AM, Philo Yang <ud1937@g= mail.com> wrote:
Hi all,

= I have a cluster of 2.0.6 and one of my tables is like this:
CREATE TABLE word (
=C2=A0 user text,
=C2=A0 word text,
=C2=A0 flag do= uble,
=C2=A0 PRIMARY KEY (user, word)
)

each "user" has about 10000= "word" per node. I have a requirement of selecting all rows wher= e user=3D'someuser' and word is in a large set whose size is about = 1000 .=C2=A0

In C* document, it is not recommended to use "sele= ct ... in" just like:

select from word where = user=3D'someuser' and word in ('a','b','aa'= ,'ab',...)=C2=A0

So now I select all rows where user=3D'someuser'= ; and=C2=A0filtrate them via client rather than via C*. Of course, I use Da= tastax Java Driver to page the resultset by setFetchSize(1000). =C2=A0Is it= the best way? I found the system's load is high because of large range= query, should I change to select for only one row each time and select 100= 0 times?

just like:
select from word where user=3D'= ;someuser' and word =3D 'a';
select from word where u= ser=3D'someuser' and word =3D 'b';
select fro= m word where user=3D'someuser' and word =3D 'c';
.....

Which method will cause lower pre= ssure on Cassandra cluster?

Thanks,
Philo Yang


--001a11c02a8e5a3d7d04ff909ca4--