lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Tiger" <m.scott.ti...@gmail.com>
Subject Re: performance of sorting by date
Date Mon, 10 Sep 2007 18:07:43 GMT
I just have tested this case my self.

> 1. index the datetime as one field.

In this case, first query (not from cache) is very slow response. it seems
that FieldCache is too big.
2nd query is very fast. it seems to be cached.
And I can not use RangeQuery because of too many clauses.

ie. datetime:[20070101000000 TO 20071231235959]
it's contains 31536000 terms.

> 2. index the datetime as 6 fields.

This is recommended. first query is not slow, so fast.
also 2nd query is very fast.
There are more advantages that I can use RangeQuery very fast.

ie. yyyy:[2007 TO 2007] AND mm:[1 TO 4]
it's contains only 5 terms.

I have 2,000,000 documents in index,
first query respond in about 1500ms.

Thanks.

2007/9/7, Scott Tiger <m.scott.tiger@gmail.com>:
>
> I want to search document by sorting datetime field mainly.
> Which implementation is the best for sorting performance.
>
> 1. index the datetime as one field.
>
> fields: title, contents, datetime
>
> In this case, when there are documents that the datefield increases by 1
> second between first of 2007 and end of 2007, the number of Term becomes
> about 31536000 (seconds in 365 * 24 * 60 * 60).
>
> 2. index the datetime as 6 fields.
>
> fields: title, contents, year, month, day, hour, minute, second.
>
> In this case, Term of each field is,
>  year : 1
>  month : 12
>  day : 31
>  hour : 24
>  minute : 60
>  second : 60
> totally, 188 terms. But sorting needs 6 field.
>
> sample code:
> String[] sortFields = { "year", "month", "day", "hour", "minute", "second"
> };
> Sort sort = new Sort(sortFields);
> Hits hits = searcher.search(query, sort);
>
> (3rd approach, index datetime as 2 field, yyyymmdd and hhmmss.)
>
> I also need periodically about 1-10 minutes reopen index (add/delete
> documents).
>
>
> Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message