lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Performance problems with Lucene 2.9
Date Mon, 30 Nov 2009 15:56:10 GMT
Hi

First you can use MatchAllDocsQuery, which matches all documents. It will
save a HUGE posting list (TAG:TAG), and performs much faster. For example
TAG:TAG computes a score for each doc, even though you don't need it.
MatchAllDocsQuery doesn't.

Second, move away from Hits ! :) Use Collectors instead.

If I understand the chain of filters, do you think you can code them with a
BooleanQuery that is added BooleanClauses, each with is Term (field:value)?
You can add clauses w/ OR, AND, NOT etc.

Note that in Lucene 2.9, you can avoid scoring documents very easily, which
is a performance win if you don't need scores (i.e. if you just want to
match everything, not caring for scores).

Shai

On Mon, Nov 30, 2009 at 5:47 PM, Michel Nadeau <akaris@gmail.com> wrote:

> Hi,
>
> we use Lucene to store around 300 millions of records. We use the index
> both
> for conventional searching, but also for all the system's data - we
> replaced
> MySQL with Lucene because it was simply not working at all with MySQL due
> to
> the amount or records. Our problem is that we have HUGE performance
> problems... whenever we search, it takes forever to return results, and
> Java
> uses 100% CPU/RAM.
>
> Our index fields are like this:
>
> TYPE
> PK
> FOREIGN_PK
> TAG
> ...other information depending on type...
>
> * All fields are Field.Index.UN_TOKENIZED
> * The field "TAG" always contains the value "TAG".
>
> Whenever we search in the index, our query is "TAG:TAG" to match all
> documents, and we do the search like this:
>
>        // Search
>        Hits h = searcher.search(q, cluCF, cluSort);
>
> cluCF is a ChainedFilter containing all the other filters (like
> FOREIGN_PK=12345, TYPE=a, etc.).
>
> I know that the method is probably crazy because "TAG:TAG" is matching all
> 300M documents and then it applies filters; so that's probably why every
> little query is taking 100% CPU/RAM.... but I don't know how to do it
> properly.
>
> Help ! Any advice is welcome.
>
> - Mike
> akaris@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message