Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 59B6F10AEC for ; Wed, 6 Nov 2013 22:09:30 +0000 (UTC) Received: (qmail 37333 invoked by uid 500); 6 Nov 2013 22:09:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 37253 invoked by uid 500); 6 Nov 2013 22:09:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 37245 invoked by uid 99); 6 Nov 2013 22:09:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Nov 2013 22:09:27 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dan@chill.com designates 209.85.160.42 as permitted sender) Received: from [209.85.160.42] (HELO mail-pb0-f42.google.com) (209.85.160.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Nov 2013 22:09:21 +0000 Received: by mail-pb0-f42.google.com with SMTP id jt11so132649pbb.29 for ; Wed, 06 Nov 2013 14:09:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:content-type:content-transfer-encoding; bh=E7QQxtlRes7QvFQPqcgxeSLrt1mnfeKamqh9uRlsc4k=; b=YBvUxPTVFTq/Gf5SrtpnkAVseiyYOZSfQL3kEEkYktEcAjYDvkIJHNgZQ7pu7XrJ2R Mrm8oOR/WaT3kVgb1Gl1lfxUVvSVF4e2yvDVK3M9a1XgxC2PXEIU07lorKeY5MvI9OwC raOdysafWdwW00DE6UGZVfkYhbpMzh3OxLO+MnthMoq+Q02C493sog+dwYeMq5AcTSu9 g43XhdfYeUj9OLioFS6jG5MPmWTgiMBm4B74M8k2rkXQOhem7O+OBYrT6Pu8CUTL04uX UMCiiRX2e/p47Kx8FELRuSu/kP1M6EAgCTa+hNbjz4rNObWDXg+J/VdPT1wvwVGnJtAb IAZw== X-Gm-Message-State: ALoCoQlk0Oy492/xQqO7DuRwWTr5L0+/yhzs+nQA2/8uAB1ESyBChGHtzTMp4HOoi5bsMxFgsaQP X-Received: by 10.66.149.165 with SMTP id ub5mr6081809pab.81.1383775740661; Wed, 06 Nov 2013 14:09:00 -0800 (PST) Received: from Daniels-MacBook-Pro-2.local (adsl-69-231-206-116.dsl.irvnca.pacbell.net. [69.231.206.116]) by mx.google.com with ESMTPSA id x8sm289395pbf.0.2013.11.06.14.08.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 06 Nov 2013 14:09:00 -0800 (PST) Message-ID: <527ABDFA.40607@chill.com> Date: Wed, 06 Nov 2013 14:08:58 -0800 From: Dan Gould User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: CQL 'IN' predicate Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I was wondering if anyone had a sense of performance/best practices around the 'IN' predicate. I have a list of up to potentially ~30k keys that I want to look up in a table (typically queries will have <500, but I worry about the long tail). Most of them will not exist in the table, but, say, about 10-20% will. Would it be best to do: 1) SELECT fields FROM table WHERE id in (uuid1, uuid2, ...... uuid30000); 2) Split into smaller batches-- for group_of_100 in all_30000: // ** Issue in parallel or block after each one?? SELECT fields FROM table WHERE id in (group_of_100 uuids); 3) Something else? My guess is that (1) is fine and that the only worry is too much data returned (which won't be a problem in this case), but I wanted to check that it's not a C* anti-pattern before. [Conversely, is a batch insert with up to 30k items ok?] Thanks, Dan