Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 317056DDE for ; Wed, 22 Jun 2011 22:36:51 +0000 (UTC) Received: (qmail 23318 invoked by uid 500); 22 Jun 2011 22:36:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23291 invoked by uid 500); 22 Jun 2011 22:36:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23283 invoked by uid 99); 22 Jun 2011 22:36:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 22:36:48 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a41.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 22:36:43 +0000 Received: from homiemail-a41.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a41.g.dreamhost.com (Postfix) with ESMTP id 761C744C058 for ; Wed, 22 Jun 2011 15:36:17 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=bez6yGTwvM1bNHD8PBJQ0RKBIBWWsaKAC+3+tyU+3Xp 3Y2+rPVixH3RDqoZ4V3kBaxc3kEVaw4XMOzO9eN61db8uD4tAxQt/9gaI9mcIe+D jrob8mV1JBUb9UbevhPJyJpnJ2YdnxKoptmSieJm11hEzsGwZyieP1i/vRuSHbHo = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=5oM8V/LDYIrCZzVlgjgPnYzuiXs=; b=yrLhBQad48 dB6hG4cOybgCz0ClLFymqK5obYg5q4ybc7D6N5YMS7UjyYqLpH8sKk9udSEcI20P 0UoUwaCVxFD0IHn0yWir+1NjwdSPfK4DiSjQ050c9v2QloPHoWq/k8soPPNBRrGc EMfQ9yZ8/rWqQQe91FPdwgpZVTzaJ4c5A= Received: from [10.0.1.151] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a41.g.dreamhost.com (Postfix) with ESMTPSA id D336D44C055 for ; Wed, 22 Jun 2011 15:36:16 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Secondary indexes performance From: aaron morton In-Reply-To: Date: Thu, 23 Jun 2011 10:36:14 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: <953CCDA7-8F30-4440-B99D-5200DFD6904A@thelastpickle.com> References: <626D3DDC-95B1-49C8-B8DF-6A1410A8A3A5@thelastpickle.com> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1084) > it will probably be better to denormalize and store > some precomputed data Yes, if you know there are queries you need to serve it is better to = support those directly in the data model.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jun 2011, at 23:52, Wojciech Pietrzok wrote: > OK, got some results (below). > 2 nodes, one on localhost, second on LAN, reading with > ConsistencyLevel.ONE, buffer_size=3D512 rows (that's how many rows > pycassa will get on one connection, than it will use last row_id as > start row for next query) >=20 > Queries types: > 1) get_range - just added limit of 1024 rows > 2) get_indexed_slices ASCII - one term: on indexed column with ASCII = type > 3) get_indexed_slices INT - one term: on indexed column with INT type > 4) get_indexed_slices ASCII + GTE, LTE on indexed INT - three terms: > on indexed column with INT type + LTE, GTE on indexed column with INT > type > 5) get_indexed_slices 2 terms, ASCII - two terms, both columns > indexed, with ASCII type > 6) get_indexed_slices ASCII + GTE, LTE on non indexed INT - like 4) > but LTE, GTE are on non-indexed column >=20 > 3 runs for each set of queries, on successive runs times were better. > Times are in seconds >=20 >=20 > But if you say that 1024 rows is reasonably big slice (not mentioning > over 10k rows) it will probably be better to denormalize and store > some precomputed data >=20 >=20 > Results: >=20 > # Run 1 > PERF: [a] get_range: 0.58[s] > PERF: [a] get_indexed_slices ASCII: 3.96[s] > PERF: [a] get_indexed_slices INT: 1.82[s] > PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT: 1.31[s] # > 314 returned > PERF: [cr] get_indexed_slices ASCII: 1.13[s] > PERF: [cr] get_indexed_slices 2 terms, ASCII: 8.69[s] >=20 > # Run 2, same queries > PERF: [a] get_range: 0.33[s] > PERF: [a] get_indexed_slices ASCII: 0.36[s] > PERF: [a] get_indexed_slices INT: 5.39[s] > PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 5.42[s] # > 314 returned > PERF: [cr] get_indexed_slices ASCII: 0.55[s] > PERF: [cr] get_indexed_slices 2 terms, ASCII: 3.57[s] >=20 > # Run 3, same queries > PERF: [a] get_range: 0.18[s] > PERF: [a] get_indexed_slices ASCII: 0.39[s] > PERF: [a] get_indexed_slices INT: 0.83[s] > PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 0.85[s] # > 314 returned > PERF: [cr] get_indexed_slices ASCII: 0.39[s] > PERF: [cr] get_indexed_slices 2 terms, ASCII: 3.36[s] >=20 > # changed some terms, so always 1024 returned are returned > # Run 1 > PERF: [a] get_range: 0.31[s] > PERF: [a] get_indexed_slices ASCII: 3.14[s] > PERF: [a] get_indexed_slices INT: 0.70[s] > PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 4.72[s] > PERF: [cr] get_indexed_slices ASCII: 0.73[s] > PERF: [cr] get_indexed_slices 2 terms, ASCII: 0.85[s] > PERF: [cr] get_indexed_slices ASCII + GTE, LTE on non indexed INT : = 2.17[s] >=20 > # Run 2, same queries > PERF: [a] get_range: 0.20[s] > PERF: [a] get_indexed_slices ASCII: 0.60[s] > PERF: [a] get_indexed_slices INT: 1.22[s] > PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 1.27[s] > PERF: [cr] get_indexed_slices ASCII: 0.48[s] > PERF: [cr] get_indexed_slices 2 terms, ASCII: 0.50[s] > PERF: [cr] get_indexed_slices ASCII + GTE, LTE on non indexed INT : = 2.22[s] >=20 > # Run 3, same queries > PERF: [a] get_range: 0.25[s] > PERF: [a] get_indexed_slices ASCII: 0.44[s] > PERF: [a] get_indexed_slices INT: 0.89[s] > PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 6.58[s] > PERF: [cr] get_indexed_slices ASCII: 1.18[s] > PERF: [cr] get_indexed_slices 2 terms, ASCII: 0.50[s] > PERF: [cr] get_indexed_slices ASCII + GTE, LTE on non indexed INT : = 2.09[s] >=20 >=20 >=20 >=20 > 2011/6/21 aaron morton : >> Can you provide some more information on the query you are running ? = How many terms are you selecting with? >>=20 >> How long does it take to return 1024 rows ? IMHO thats a reasonably = big slice to get. >>=20 >> The server will pick the most selective equality predicate, and then = filter the results from that using the other predicates. >>=20 >> Cheers >=20 >=20 > --=20 > -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D-=3D-=3D-=3D- > KosciaK mail: kosciak1@gmail.com > www : http://kosciak.net/ > -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D-=3D-=3D-=3D-