lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vanlerberghe, Luc" <Luc.Vanlerber...@bvdep.com>
Subject RE: BooleanQuery - TooManyClauses
Date Tue, 26 Oct 2004 18:22:43 GMT
Even if you need to be able to search on ranges that include the time,
you could benefit from adding a few extra fields to your documents.

For example: add a year field and an hour field:

If the user then specifies a range between 2001-08-10 11:00 and
2004-10-11 13:00, you break it up behind the scenes into three parts as
follows:
- a query on the date field alone, testing on the range 2001-08-11 to
2004-10-10 (i.e. all dates fully within the date range) -=> max number
of clauses=max number of dates in your documents
- a query on the hour field for the first date -=> max number of
clauses=24
- a query on the hour field for the last date -=> max number of
clauses=24
(You'll need a special case if the start and end happen to be on the
same date of course)

I'm not that familiar with the QueryParser syntax yet, but it should
look something like this (note the use of curly brackets for the
exclusive date ranges):
(date:{20010810 TO 20041011}) OR (+date:20010910 +time:[11 TO ]) OR
(+date:20041011 +time:{ TO 13})

If you need even more fine-grained ranges, you can extend this idea by
adding more fields (at the cost of making the generated query even more
complex)

You can already add the separate fields to your documents even if you
don't use them yet...

Regards,

Luc


> -----Original Message-----
> From: Terry Steichen [mailto:terry@net-frame.com] 
> Sent: dinsdag 26 oktober 2004 18:28
> To: Lucene Users List
> Subject: Re: BooleanQuery - TooManyClauses 
> 
> I think what Erik's asking is whether you can live with 
> expressing your indexed date in the form of YYYYMMDD, without 
> the hour and minute extension.  That will sharply educe the 
> number of range query expansion terms.  If you're using the 
> timestamp as a unique identifier, you might consider creating 
> two fields, one for the unique identifier (YYYYMMDDHHmmssZ) 
> and one for the date (YYYYMMDD), and only use the range on 
> the date field (not on the timestamp field)
> 
> Regards,
> 
> Terry
>   ----- Original Message -----
>   From: Angelov, Rossen
>   To: 'Lucene Users List' 
>   Sent: Tuesday, October 26, 2004 11:43 AM
>   Subject: RE: BooleanQuery - TooManyClauses 
> 
> 
>   >
>   >On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote:
>   >> Why there is a limit on the number of clauses? and is 
> there any harm in
>   >> setting MaxClauseCount to Integer.MAX_VALUE?
>   >
>   >The harm is in performance and resource utilization.  
> Rather than do
>   >this, though, read on...
>   >
>   >> I'm using a Range Query on a field that represents dates 
> and getting
>   >> BooleanQuery$TooManyClauses exception.
>   >> This is the query -  
> +/article/createddateiso8601:[20030101000000 TO
>   >> 20031231999999]
>   >
>   >Do you really need to do ranges down to that time level?  
> Or are you
>   >really just concerned with date?  If you indexed using YYYYMMDD
>   >instead, there would only be a maximum of 365 terms in that range,
>   >whereas you've got zillions (ok, I was too lazy to do the 
> math!  But
>   >far more than 1,024).
> 
>   I need to do range searches. They are part of the 
> requirements and even
>   worse, the range can be as big as up to 10 years for now. 
> It will get
>   bigger. I'm indexing using YYYYMMDDHHmmssZ format and as 
> you said there will
>   be more than just 365 terms per year. This number changes 
> every day as new
>   documents are indexed daily. The only limit I can see is 
> the number of
>   documents that were indexed. I guess maxClauseCount can't 
> be more than the
>   indexed documents.
> 
>   >I recommend changing how you index dates, or at least use 
> a different
>   >field for queries that do not need to concern themselves with the
>   >timestamp aspect.
> 
>   What do you mean change how the dates are indexed? By the 
> way this field is
>   indexed as a string.
> 
>   >
>   > Erik
>   >
>   >
> 
>   Ross
> 
>   "This communication is intended solely for the addressee and is
>   confidential and not for third party unauthorized distribution."
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message