Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 363709EC2 for ; Wed, 4 Jul 2012 09:52:55 +0000 (UTC) Received: (qmail 65750 invoked by uid 500); 4 Jul 2012 09:52:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 65693 invoked by uid 500); 4 Jul 2012 09:52:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 65681 invoked by uid 99); 4 Jul 2012 09:52:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jul 2012 09:52:52 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a83.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jul 2012 09:52:46 +0000 Received: from homiemail-a83.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTP id 4808E5E063 for ; Wed, 4 Jul 2012 02:52:26 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=nd45s3MIwFkNimC3+CX2P8rwXi0rBXCFR5BRF5aZa7I lwjn9dUwxfgApTOYq2BmCMAdtjNSzKgycUJ0fq7nM513xZBLTQk+SgR2DJiaRmSe qxQZ8c2zX6oG8XjzzJWORTUMosi5ouZLCCTPw5fkAIMNAHDV5GjWsiZZq++5aSEk = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=ilZrCVIo9dIxUu9JHv57aNoiLVM=; b=q+KKc9o0k2 0eEWW480b6qRoJ1iH4F1hLqcnfbQom8AdgEdOlP4sLPWB+2/L0X1nwTcUobxRzOQ +HsAtshGdVNXKhLNoDlSqxCjvotqgUu9XEro0IMJuxtPbyMNXLC8duv6Tue7jsB5 bOCcv1Oicumh178HbVEYjnS3zQ+K4ksK4= Received: from [172.16.1.4] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTPSA id 283685E060 for ; Wed, 4 Jul 2012 02:52:24 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1278) Subject: Re: Secondary Index, performance , data type From: aaron morton In-Reply-To: Date: Wed, 4 Jul 2012 21:52:18 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1278) X-Virus-Checked: Checked by ClamAV on apache.org > select my_cf where columnA =3D a and columnB =3D b and columnC =3D c = and columnD =3D d Cassandra will only use one equality clause to select the candidate = rows. The other clauses are applied to the rows using that first clause.=20= The clause to use to select candidate rows is based on statistics that = estimate the number of columns in the indexes.=20 > Do you have any ideas? is there any way to understand how cassandra = internally run the query (a kind of "explain plan")?=20 The only way I know of to see the "query plan" is to set DEBUG logging = on org.apache.cassandra.db.index.keys.KeysSearcher and look for the = message "Primary scan clause is " Note, if this is a common query you may get better performance creating = a custom secondary index than using four equality clauses in an index = scan. > 2/ Is there any limitations on the number of criterias we can usually = have?=20 None that I know of. Query will probably run slower the more you have.=20= > 3/ Even if we have different data type (date, string, int), we have = all stored them as UTF8Type. Could we expect performance improvements if = we use DateType, LongType? No. The main issue is going to be the selectivity of the primary scan = clause, followed by the number of additional clauses. Their types will = have very little / no impact. Hope that helps.=20 =20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/07/2012, at 3:59 AM, Olivier Mallassi wrote: > Hi all >=20 > We have 4 indexed columns; all configured in UT8Type even if one = columns is a date and the other an integer).=20 >=20 > 1/ the read query we run can have up to 4 criteria > select my_cf where columnA =3D a and columnB =3D b and columnC =3D c = and columnD =3D d >=20 > This query, is fast (<500ms) up to 3 criterias but when we add the = fourth one, the exection time is 9,5s.=20 > Do you have any ideas? is there any way to understand how cassandra = internally run the query (a kind of "explain plan")?=20 >=20 > 2/ Is there any limitations on the number of criterias we can usually = have?=20 >=20 > 3/ Even if we have different data type (date, string, int), we have = all stored them as UTF8Type. Could we expect performance improvements if = we use DateType, LongType? >=20 > Many thx for all your answers.=20 >=20 > --=20 > ............................................................ > Olivier Mallassi > OCTO Technology > ............................................................ > 50, Avenue des Champs-Elys=E9es > 75008 Paris >=20 > Mobile: (33) 6 28 70 26 61 > T=E9l: (33) 1 58 56 10 00 > Fax: (33) 1 58 56 10 01 >=20 > http://www.octo.com=20 > Octo Talks! http://blog.octo.com >=20 >=20