Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 663E110D80 for ; Wed, 6 Nov 2013 23:06:23 +0000 (UTC) Received: (qmail 89045 invoked by uid 500); 6 Nov 2013 23:06:20 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 88984 invoked by uid 500); 6 Nov 2013 23:06:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 88975 invoked by uid 99); 6 Nov 2013 23:06:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Nov 2013 23:06:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.215.177] (HELO mail-ea0-f177.google.com) (209.85.215.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Nov 2013 23:06:15 +0000 Received: by mail-ea0-f177.google.com with SMTP id f15so84804eak.36 for ; Wed, 06 Nov 2013 15:05:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=kyEjLCQgDbT18pK4i5mCUWog88MK4CEHuRZdSECjslY=; b=I+titxvwPq+YGVs48n6s5rDhi1qtJ/uqhp8Fwqf1wZEIMmrZQfaenBPlgU2/B0O/3l TgVpkQVKYbmRlHW/nR5cwN+afYKe7m7PVM+Chrk5lUwvVodpIaXElsDZ/gCE0vWsfkLm JXVvZI25B3HtjdOMPN7OV33mSlNpTaVtd9wrHliOrh+hvBKHGdf3b8ZIGAtLbJ1M6bkI /U4BNoBlzN49BIhogPckRFFttms5dopaFJfsehmGyqr13WZZTuAKktYFB51cTid3iNpB njGg4njkps9y+J5JycX+qfAR4/dzg7Rm6EcgZKJf0bAnj1tzztOiU6itZQ0QPEXiIkWM 9vsw== X-Gm-Message-State: ALoCoQnZfI3ImVomqg0cdDLh78ZIMUbBV5D9/fyWNAPAS6SgY25jWE/XnDWAv4NtdVQB2IblCt0t MIME-Version: 1.0 X-Received: by 10.14.107.68 with SMTP id n44mr6247542eeg.26.1383779153406; Wed, 06 Nov 2013 15:05:53 -0800 (PST) Received: by 10.223.2.201 with HTTP; Wed, 6 Nov 2013 15:05:53 -0800 (PST) X-Originating-IP: [70.112.126.233] In-Reply-To: <527ABDFA.40607@chill.com> References: <527ABDFA.40607@chill.com> Date: Wed, 6 Nov 2013 17:05:53 -0600 Message-ID: Subject: Re: CQL 'IN' predicate From: Nate McCall To: Cassandra Users Content-Type: multipart/alternative; boundary=001a11c29ad2e3020704ea8a306b X-Virus-Checked: Checked by ClamAV on apache.org --001a11c29ad2e3020704ea8a306b Content-Type: text/plain; charset=ISO-8859-1 Unless you explicitly set a page size (i'm pretty sure the query is converted to a paging query automatically under the hood) you will get capped at the default of 10k which might get a little weird semantically. That said, you should experiment with explicit page sizes and see where it gets you (i've not tried this yet with an IN clause - would be real curious to hear how it worked). Another thing to consider is that it's a pretty big statement to parse every time. You might want to go the (much) smaller batch route so these can be stored procedures? (another thing I havent tried with IN clause - don't see why it would not work though). On Wed, Nov 6, 2013 at 4:08 PM, Dan Gould wrote: > I was wondering if anyone had a sense of performance/best practices > around the 'IN' predicate. > > I have a list of up to potentially ~30k keys that I want to look up in a > table (typically queries will have <500, but I worry about the long tail). > Most > of them will not exist in the table, but, say, about 10-20% will. > > Would it be best to do: > > 1) SELECT fields FROM table WHERE id in (uuid1, uuid2, ...... uuid30000); > > 2) Split into smaller batches-- > for group_of_100 in all_30000: > // ** Issue in parallel or block after each one?? > SELECT fields FROM table WHERE id in (group_of_100 uuids); > > 3) Something else? > > My guess is that (1) is fine and that the only worry is too much data > returned (which won't be a problem in this case), but I wanted to check > that it's not a C* anti-pattern before. > > [Conversely, is a batch insert with up to 30k items ok?] > > Thanks, > Dan > > -- ----------------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com --001a11c29ad2e3020704ea8a306b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Unless you explicitly set a page size (i'm pretty sure= the query is converted to a paging query automatically under the hood) you= will get capped at the default of 10k which might get a little weird seman= tically. That said, you should experiment with explicit page sizes and see = where it gets you (i've not tried this yet with an IN clause - would be= real curious to hear how it worked).=A0

Another thing to consider is that it's a pretty big stat= ement to parse every time. You might want to go the (much) smaller batch ro= ute so these can be stored procedures? (another thing I havent tried with I= N clause - don't see why it would not work though).


<= br>
On Wed, Nov 6, 2013 at 4:08 PM, Dan Gould= <d= an@chill.com> wrote:
I was wondering if anyone had a sense of per= formance/best practices
around the 'IN' predicate.

I have a list of up to potentially ~30k keys that I want to look up in a table (typically queries will have <500, but I worry about the long tail= ). =A0Most
of them will not exist in the table, but, say, about 10-20% will.

Would it be best to do:

1) SELECT fields FROM table WHERE id in (uuid1, uuid2, ...... uuid30000);
2) Split into smaller batches--
for group_of_100 in all_30000:
=A0 =A0// ** Issue in parallel or block after each one??
=A0 =A0SELECT fields FROM table WHERE id in (group_of_100 uuids);

3) Something else?

My guess is that (1) is fine and that the only worry is too much data retur= ned (which won't be a problem in this case), but I wanted to check that= it's not a C* anti-pattern before.

[Conversely, is a batch insert with up to 30k items ok?]

Thanks,
Dan




--
-----------------
Nate McCall
Austin, TX
@zznate

Co-Fo= under & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpi= ckle.com
--001a11c29ad2e3020704ea8a306b--