Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 67309 invoked from network); 18 Nov 2010 19:30:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Nov 2010 19:30:46 -0000 Received: (qmail 49821 invoked by uid 500); 18 Nov 2010 19:31:18 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 49788 invoked by uid 500); 18 Nov 2010 19:31:18 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 49781 invoked by uid 99); 18 Nov 2010 19:31:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Nov 2010 19:31:18 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [65.55.88.12] (HELO TX2EHSOBE004.bigfish.com) (65.55.88.12) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Nov 2010 19:31:09 +0000 Received: from mail33-tx2-R.bigfish.com (10.9.14.249) by TX2EHSOBE004.bigfish.com (10.9.40.24) with Microsoft SMTP Server id 14.1.225.8; Thu, 18 Nov 2010 19:30:46 +0000 Received: from mail33-tx2 (localhost.localdomain [127.0.0.1]) by mail33-tx2-R.bigfish.com (Postfix) with ESMTP id B46FC8F04CD for ; Thu, 18 Nov 2010 19:30:46 +0000 (UTC) X-SpamScore: -37 X-BigFish: VPS-37(zz542N1432N98dN1447R9371P10d1Izz1202hzz8275dhz2dh2a8h668h67dh685h61h) X-Spam-TCS-SCL: 0:0 X-Forefront-Antispam-Report: KIP:(null);UIP:(null);IPVD:NLI;H:CPHUB1.canoga.com;RD:rrcs-74-62-37-82.west.biz.rr.com;EFVD:NLI Received: from mail33-tx2 (localhost.localdomain [127.0.0.1]) by mail33-tx2 (MessageSwitch) id 1290108645546750_3128; Thu, 18 Nov 2010 19:30:45 +0000 (UTC) Received: from TX2EHSMHS018.bigfish.com (unknown [10.9.14.242]) by mail33-tx2.bigfish.com (Postfix) with ESMTP id 81709C88051 for ; Thu, 18 Nov 2010 19:30:45 +0000 (UTC) Received: from CPHUB1.canoga.com (74.62.37.82) by TX2EHSMHS018.bigfish.com (10.9.99.118) with Microsoft SMTP Server (TLS) id 14.1.225.8; Thu, 18 Nov 2010 19:30:41 +0000 Received: from CPHUB2.canoga.com (172.16.1.94) by CPHUB1.canoga.com (172.16.1.93) with Microsoft SMTP Server (TLS) id 8.2.213.0; Thu, 18 Nov 2010 11:30:51 -0800 Received: from vserver1.canoga.com ([169.254.2.14]) by CPHUB2.canoga.com ([172.16.1.94]) with mapi; Thu, 18 Nov 2010 11:30:51 -0800 From: "Bergquist, Brett" To: "derby-dev@db.apache.org" Date: Thu, 18 Nov 2010 11:27:10 -0800 Subject: RE: Question on cardinality, statistics, and when things go stale Thread-Topic: Question on cardinality, statistics, and when things go stale Thread-Index: AcuHUagqhh8LAeYQTJm2bqVRk0HsFQAAIPpw Message-ID: <97EB699F861AD841B5908C7CA9C956560127F9D3466E@VSERVER1.canoga.com> References: <97EB699F861AD841B5908C7CA9C956560127F9D345FE@VSERVER1.canoga.com> <4CE5758F.4010001@sbcglobal.net> In-Reply-To: <4CE5758F.4010001@sbcglobal.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US X-TM-AS-Product-Ver: SMEX-8.0.0.1307-6.500.1024-17662.005 X-TM-AS-Result: No--28.090000-0.000000-31 X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: canoga.com X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the explanation! So if I understand this correctly, if the inde= xes that I am querying on are always unique (the computed value that is ind= ex is always unique), then as long as the optimizer knows about these stati= stics (ie I have run "syscs_util.syscs_update_statistics()" after the table= has a few rows), and my query is always like "select * from table where in= dexval >=3D 'some starting value' and indexval < 'some ending value'" then = it should always use the index? Again, thanks. Brett -----Original Message----- From: Mike Matrigali [mailto:mikem_app@sbcglobal.net]=20 Sent: Thursday, November 18, 2010 1:51 PM To: derby-dev@db.apache.org Subject: Re: Question on cardinality, statistics, and when things go stale derby uses 2 types of statistics, for this discussion I will call them distribution and cardinality. For distribution derby uses the index themselves at query compile time, so this info never goes stale. For distribution the optimizer might need to know what percentage of the keys in an index are between value1=20 and value2. Cardinality information can go stale and depends on the actual data in the table. Cardinality is used by the optimizer when the query does not=20 have actual values at compile time (ie. ? operators). Cardinality is=20 basically a single number that represents the average number of duplicates per data value in the table. In the case of a non unique index on a single column a logically it will maintain one number that represents the average number of=20 duplicates for a. The stat can never be wrong for a unique index on a single column as it is always 1 value per data value. For a 2 column unique index on (a, b) a cardinality count will be maintained on just the a values as they can be duplicate and it is=20 possible to use the index for just qualifying on a. No cardinality count is necesary for queries that will provide both a and b as again the cardinality count for that is a, b. For many applications once a significant number of rows in the table have been inserted it is enough to get the statistics once and as rows come and go it does not matter as the average does not change. But of course it is easy to define an app that can break this. Your question also depends on the query. Assuming your index is on (a,=20 b) and your query is something like select * from x where a =3D ? and b =3D= =20 ? then the system is unlikely to ever not pick the index. But usually=20 the choice is not as clear. Bergquist, Brett wrote: > I have read as much as I can about this subject and am a little=20 > confused. I understand the cardinality statistics are not computed if=20 > the tables are empty when indexes are created and I can work around that.= =20 >=20 > =20 >=20 > Say I have a table with 15 million records in it and there is a main=20 > index that I need to be used to query and I update the statistics and=20 > verify that query optimizer is using that index. This table will have=20 > many rows inserted and many rows deleted every day. Basically the index= =20 > is a function of a timestamp and a counter. So older records are going=20 > to be deleted and newer records are going to be inserted. >=20 > =20 >=20 > My question is will the query optimizer revert back to doing a table=20 > scan at some point instead of using the index. Basically will the index= =20 > go stale?=20 >=20 > =20 >=20 > A second question is does a call to=20 > "syscs_util.syscs_update_statistics' lock a table/index while it is=20 > being performed? This tables of concern where the indexes might go=20 > stale never have a free second in which inserts and deletes are being don= e.... >=20 > =20 >=20 > =20 >=20 > Thanks for any information. >=20 > =20 >=20 > Brett >=20 > =20 >=20