lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces
Date Wed, 01 Apr 2009 17:51:13 GMT
Think about putting this query in Luke and doing an "explain" for details,
but....

I'm surprised this is working at all without throwing TooManyClauses errors.
Under the covers, Lucene expands your wildcards to all terms in the field
that match. For instance, assume your document field has the following:
aa
ab
ac
ad
ae

Now, searching for a* produces a clause like:
(aa OR ab OR ac OR ad OR ae) in place of the a*

So your query is generating ginormous OR clauses, one that
contains every term in your content field starting with 'g'. Another
with every term in your content field starting with 'h' etc. So I suspect
that your content field doesn't have very many distinct terms in it....

As for why it's returning few entries, what does this part of your
query return by itself? Since it's anded with your wildcard query,
it might be what's limiting your results.

((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9)
(+viewers:cpuser9 +viewers:cpuser4))

But I'm puzzled, because saying that you're getting OOM errors
doesn't square very well with getting *any* results at all, so is
there something else going on?

Best
Erick@MoreQuestionsThanAnswers.


On Wed, Apr 1, 2009 at 1:31 PM, Lebiram <lebiram@ymail.com> wrote:

> Hi All,
>
> I have the following query on a 1GB index with about 12 million docs :
> As you can see the terms consist of wildcards...
>
> query.toString()=+(+content:g* +content:h* +content:d* +content:s*
> +content:a* +content:w* +content:b* +content:c* +content:m* +content:e*)
> +((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9)
> (+viewers:cpuser9 +viewers:cpuser4))
>
> The Searcher is a MultiSearcher with 4 IndexSearchers pointing to 4
> different Lucene Index.
> This search returns very few records, several ten thousand hits.
>
> Java is assigned with 1GB max memory.
>
> Somehow this search eats the entire java heap space and causes OOM.
> Looking at jProfiler, i see org.apache.lucene package generating a lot of
> objects which I believe is taking all this space.
>
> Can anyone explain the reason why this particular search might take so much
> memory?
> Is there anything I am doing wrong here?
> More importantly, is there anything I can do to improve this?
>
> -M
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message