lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: BooleanQuery - TooManyClauses
Date Tue, 26 Oct 2004 18:21:26 GMT
On Oct 26, 2004, at 1:55 PM, Angelov, Rossen wrote:
> OK, I got that part - to limit the clause counts limit the range. In 
> my case
> replace the timestamp with date and if it gets too big again replace 
> the
> YYYYMMDD with YYYYMM and later with YYYY. And that of course includes 
> fixing
> the old files every time so they have new field.
> I was actually looking for more robust solution but this should do for 
> now.

More robust, as in does not require re-indexing?

This is one of the tricky things about making a search engine.  Having 
fast searches, yet reserving the right to change how you query at a 
later time without re-indexing.  Unfortunately it doesn't work that 
way.  You have to consider the types of queries that will be made in 
order to index appropriately.  Changes in types of queries may 
necessitate a re-index to accommodate.

You may want to go ahead and index one field as YYYYMMDD, and another 
as YYYY.  and possibly another as YYYYMM.

You could also utilize a Filter for constraining searches based on a 
date range.  QueryFilter is one option, or writing a custom one that 
selects the appropriate documents.

	Erik


>
> Thanks,
> Ross
>
> -----Original Message-----
> From: Terry Steichen [mailto:terry@net-frame.com]
> Sent: Tuesday, October 26, 2004 11:28 AM
> To: Lucene Users List
> Subject: Re: BooleanQuery - TooManyClauses
>
>
> I think what Erik's asking is whether you can live with expressing your
> indexed date in the form of YYYYMMDD, without the hour and minute 
> extension.
> That will sharply educe the number of range query expansion terms.  If
> you're using the timestamp as a unique identifier, you might consider
> creating two fields, one for the unique identifier (YYYYMMDDHHmmssZ) 
> and one
> for the date (YYYYMMDD), and only use the range on the date field (not 
> on
> the timestamp field)
>
> Regards,
>
> Terry
>   ----- Original Message -----
>   From: Angelov, Rossen
>   To: 'Lucene Users List'
>   Sent: Tuesday, October 26, 2004 11:43 AM
>   Subject: RE: BooleanQuery - TooManyClauses
>
>
>>
>> On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote:
>>> Why there is a limit on the number of clauses? and is there any harm 
>>> in
>>> setting MaxClauseCount to Integer.MAX_VALUE?
>>
>> The harm is in performance and resource utilization.  Rather than do
>> this, though, read on...
>>
>>> I'm using a Range Query on a field that represents dates and getting
>>> BooleanQuery$TooManyClauses exception.
>>> This is the query -  +/article/createddateiso8601:[20030101000000 TO
>>> 20031231999999]
>>
>> Do you really need to do ranges down to that time level?  Or are you
>> really just concerned with date?  If you indexed using YYYYMMDD
>> instead, there would only be a maximum of 365 terms in that range,
>> whereas you've got zillions (ok, I was too lazy to do the math!  But
>> far more than 1,024).
>
>   I need to do range searches. They are part of the requirements and 
> even
>   worse, the range can be as big as up to 10 years for now. It will get
>   bigger. I'm indexing using YYYYMMDDHHmmssZ format and as you said 
> there
> will
>   be more than just 365 terms per year. This number changes every day 
> as new
>   documents are indexed daily. The only limit I can see is the number 
> of
>   documents that were indexed. I guess maxClauseCount can't be more 
> than the
>   indexed documents.
>
>> I recommend changing how you index dates, or at least use a different
>> field for queries that do not need to concern themselves with the
>> timestamp aspect.
>
>   What do you mean change how the dates are indexed? By the way this 
> field
> is
>   indexed as a string.
>
>>
>> Erik
>>
>>
>
>   Ross
>
>   "This communication is intended solely for the addressee and is
>   confidential and not for third party unauthorized distribution."
>
>
> "This communication is intended solely for the addressee and is
> confidential and not for third party unauthorized distribution."
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message