lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fraschetti <frasche...@gmail.com>
Subject Re: BooleanQuery - Too Many Clases on date range.
Date Sun, 03 Oct 2004 21:17:40 GMT
So i decicded to move my epoch date to the  20040608 date which fixed
my boolean query problem in regards to my current data size (approx
600,000) ....

but now as soon as I do a query like ...      a*
I get the boolean error again. Google obviously can handle this query,
and I'm pretty sure jguru.com can handle it too.. any ideas? With out
without a date dange specified i still get teh  TooManyClauses error. 
I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
a out of memory error. Is this b/c the boolean search tried to
allocate that many clauses by default or because my query actually
needed that many clauses?  Why does it work on small indexes but not
large? Is there any way to have the parser create as many clauses as
it can and then search with what it has? w/o recompiling the source?

Thanks!


On Fri, 01 Oct 2004 15:48:36 +0200, Damian Gajda <dgajda@caltha.pl> wrote:
> Dnia 01-10-2004, pi┬▒ o godzinie 07:57 -0500, Scott Ganyo napisa┬│(a):
> > You can use:
> >
> > BooleanQuery.setMaxClauseCount(int maxClauseCount);
> 
> I had a similar problem with date ranges. Someone on the list suggested
> me a solution to my problems but it was more clever than the above
> solution, which helps but makes the searches work slower and is memory
> hungry (many terms are loaded into memmory, and than searched).
> 
> The solution suggested was to split dates into sub fields during
> indexing and use those fields while searching. This makes it more
> effective but harder to create a query (personally I prefer working on
> queries build using Lucene API, than ones parsed by QueryParser).
> 
> For instance the time stamp 2004-10-01 15:34:26.001 may be split into
> following fields:
> <some-date>_year: 2004
> <some-date>_month: 10
> <some-date>_day: 01
> <some-date>_time: 153426001
> 
> The above fields should be indexed so they can be searched. They give
> some nice possibilities, for instance fast and easy querying for all
> documents that have a date in a particular year, month or day of month.
> For conveniece one could also store weekdays.
> 
> A query for a date range from 15th august to 10th october 2004 (in no
> particular query language - this just gives an idea):
> <some-date>_year = 2004 AND (
>   (<some-date>_month = 08 AND <some-date>_day >= 15) OR
>   (<some-date>_month=09) OR
>   (<some-date>_month = 10 AND <some-date>_day <= 10)
> )
> 
> As You can see it is easy to build such a query from the lucene API. The
> equalities are Term queries. The inequalities are Range queries. The AND
> and OR operators can be provided by usage of Boolean queries.
> 
> Have fun implementing the solution - it has only one disadvantage. It
> makes results sorting not so easy. The solution for it is usage of
> multiple sort fields, or another stored field containing a full date
> (one almost surely will need to store a date for each hit, unless You
> want to write some baroque code to calculate date from split fields
> values).
> 
> Have fun,
> --
> Damian Gajda
> Caltha Sp. j.
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 



-- 
___________________________________________________
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e fraschetti@gmail.com | http://meteora.cs.usfca.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message