lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew King <ma...@gnik.com>
Subject Re: Datefiltering performance issues
Date Fri, 21 Jun 2002 20:13:18 GMT
In a former life (not with Lucene), I've handled this range problem by 
indexing the dates in multiple pieces (YYYY, YYYYMM, YYYYMMDD) and then 
at query time constructed multiple ranges to cover what the user wanted:

So,
   [19990323 20020612]
becomes:
   [19990323 19990331] AND
   [199904 199912] AND
   [2000 2001] AND
   [200201 200205] AND
   [20020601 20020612]

(I may have my lucene query syntax mussed up here, but hopefully my 
intention is clear)

This dramatically limits the number of terms that need to be evaluated.  
(at the expense of larger index size)   Also, the 3 term types also need 
to be in separate "fields" (or prefixed) so that the ranges only include 
one type.

The same trick can be played with non-dates by taking using a 2 word 
prefix.  ("dog" gets indexed as "dog" and "do")   Obviously care should 
be taken as to what fields have this extra indexing done.  (probably 
just Keyword)

It's an idea anyway...

- matt

On Friday, June 21, 2002, at 01:35 PM, Sylvain Puccianti wrote:

> Thanks for the quick answer !
> I've just downloaded the 1.2 release jar, and my test
> gives  me the same results. The more threads I've got,
> the slower Datefiltering gets (performance degradation
> is almost exponential).
> I tried to use the RangeQuery, as advised by Scott
> Ganyo, but it does not work very well. RangeQuery
> creates a TermQuery for each term within lowerTerm and
> higherTerm. If my range is too high, as I've got
> thoushands of documents, it just blows up memory...
> Is there any way to avoid sharing the TermInfosReader
> between all threads when creating the Bitset, or
> somehow avoid synchronizing the get method (if it is
> actually the bottleneck here) ?
>
> Thanks,
>
> Sylvain
>
> --- Doug Cutting <cutting@lucene.com> a écrit : > What
> version of Lucene are you using?  There was a
>> patch made in January
>> to address multi-threaded performance of DateFilter.
>>
>> Doug
>>
>>
>> --
>> To unsubscribe, e-mail:
>> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
>> <mailto:lucene-dev-help@jakarta.apache.org>
>>
>
> ___________________________________________________________
> Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
> Yahoo! Mail : http://fr.mail.yahoo.com
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-
> help@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message