Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 63531 invoked from network); 26 Oct 2004 16:28:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 26 Oct 2004 16:28:06 -0000 Received: (qmail 98420 invoked by uid 500); 26 Oct 2004 16:27:59 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 98287 invoked by uid 500); 26 Oct 2004 16:27:58 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 98270 invoked by uid 99); 26 Oct 2004 16:27:58 -0000 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=HTML_30_40,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from [68.230.240.35] (HELO lakermmtao04.cox.net) (68.230.240.35) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 26 Oct 2004 09:27:57 -0700 Received: from POWERPACK ([68.100.190.210]) by lakermmtao04.cox.net (InterMail vM.6.01.03.04 201-2131-111-106-20040729) with SMTP id <20041026162745.WTNH4599.lakermmtao04.cox.net@POWERPACK> for ; Tue, 26 Oct 2004 12:27:45 -0400 Message-ID: <011001c4bb78$bf6121a0$1402a8c0@POWERPACK> From: "Terry Steichen" To: "Lucene Users List" References: <4F8DDDFDAC9A864AAED5BB875129DF4B0690C964@tmskoex01.tm.thomsonmedia.com> Subject: Re: BooleanQuery - TooManyClauses Date: Tue, 26 Oct 2004 12:27:53 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_010D_01C4BB57.373C63F0" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1409 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N ------=_NextPart_000_010D_01C4BB57.373C63F0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I think what Erik's asking is whether you can live with expressing your = indexed date in the form of YYYYMMDD, without the hour and minute = extension. That will sharply educe the number of range query expansion = terms. If you're using the timestamp as a unique identifier, you might = consider creating two fields, one for the unique identifier = (YYYYMMDDHHmmssZ) and one for the date (YYYYMMDD), and only use the = range on the date field (not on the timestamp field) Regards, Terry ----- Original Message -----=20 From: Angelov, Rossen=20 To: 'Lucene Users List'=20 Sent: Tuesday, October 26, 2004 11:43 AM Subject: RE: BooleanQuery - TooManyClauses=20 > >On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote: >> Why there is a limit on the number of clauses? and is there any = harm in >> setting MaxClauseCount to Integer.MAX_VALUE? > >The harm is in performance and resource utilization. Rather than do=20 >this, though, read on... > >> I'm using a Range Query on a field that represents dates and = getting >> BooleanQuery$TooManyClauses exception. >> This is the query - +/article/createddateiso8601:[20030101000000 = TO >> 20031231999999] > >Do you really need to do ranges down to that time level? Or are you=20 >really just concerned with date? If you indexed using YYYYMMDD=20 >instead, there would only be a maximum of 365 terms in that range,=20 >whereas you've got zillions (ok, I was too lazy to do the math! But=20 >far more than 1,024). I need to do range searches. They are part of the requirements and = even worse, the range can be as big as up to 10 years for now. It will get bigger. I'm indexing using YYYYMMDDHHmmssZ format and as you said = there will be more than just 365 terms per year. This number changes every day as = new documents are indexed daily. The only limit I can see is the number of documents that were indexed. I guess maxClauseCount can't be more than = the indexed documents. >I recommend changing how you index dates, or at least use a different = >field for queries that do not need to concern themselves with the=20 >timestamp aspect. What do you mean change how the dates are indexed? By the way this = field is indexed as a string. > > Erik > > Ross "This communication is intended solely for the addressee and is confidential and not for third party unauthorized distribution." ------=_NextPart_000_010D_01C4BB57.373C63F0--