lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: BooleanQuery - TooManyClauses
Date Tue, 26 Oct 2004 16:27:53 GMT
I think what Erik's asking is whether you can live with expressing your indexed date in the
form of YYYYMMDD, without the hour and minute extension.  That will sharply educe the number
of range query expansion terms.  If you're using the timestamp as a unique identifier, you
might consider creating two fields, one for the unique identifier (YYYYMMDDHHmmssZ) and one
for the date (YYYYMMDD), and only use the range on the date field (not on the timestamp field)

Regards,

Terry
  ----- Original Message ----- 
  From: Angelov, Rossen 
  To: 'Lucene Users List' 
  Sent: Tuesday, October 26, 2004 11:43 AM
  Subject: RE: BooleanQuery - TooManyClauses 


  >
  >On Oct 25, 2004, at 6:35 PM, Angelov, Rossen wrote:
  >> Why there is a limit on the number of clauses? and is there any harm in
  >> setting MaxClauseCount to Integer.MAX_VALUE?
  >
  >The harm is in performance and resource utilization.  Rather than do 
  >this, though, read on...
  >
  >> I'm using a Range Query on a field that represents dates and getting
  >> BooleanQuery$TooManyClauses exception.
  >> This is the query -  +/article/createddateiso8601:[20030101000000 TO
  >> 20031231999999]
  >
  >Do you really need to do ranges down to that time level?  Or are you 
  >really just concerned with date?  If you indexed using YYYYMMDD 
  >instead, there would only be a maximum of 365 terms in that range, 
  >whereas you've got zillions (ok, I was too lazy to do the math!  But 
  >far more than 1,024).

  I need to do range searches. They are part of the requirements and even
  worse, the range can be as big as up to 10 years for now. It will get
  bigger. I'm indexing using YYYYMMDDHHmmssZ format and as you said there will
  be more than just 365 terms per year. This number changes every day as new
  documents are indexed daily. The only limit I can see is the number of
  documents that were indexed. I guess maxClauseCount can't be more than the
  indexed documents.

  >I recommend changing how you index dates, or at least use a different 
  >field for queries that do not need to concern themselves with the 
  >timestamp aspect.

  What do you mean change how the dates are indexed? By the way this field is
  indexed as a string.

  >
  > Erik
  >
  >

  Ross

  "This communication is intended solely for the addressee and is
  confidential and not for third party unauthorized distribution."


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message