lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lebiram <>
Subject Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces
Date Thu, 02 Apr 2009 16:33:21 GMT
I think I have looked at constant score queries however, the relevance is of value to the users
so we left it as is. :(

Erick's idea of stripping terms with wildcards that has less then an acceptable number of
characters is a good idea and I might try it once I get the time.



From: Mark Miller <>
Sent: Thursday, April 2, 2009 3:52:24 PM
Subject: Re: Search using MultiSearcher generates OOM on a 1GB total  Partitioned indeces

You might try a constant score wildcard query (similar to a filter) - I 
think you'd have to grab it from solr's codebase until 2.9 comes out 
though. No clause limit, and reportedly *much* faster on large indexes.

- Mark

Lebiram wrote:
> Hi Erick
> The query was a test data basically in anticipation of searches on all indices (4 index)
with 12 million docs
> that should yield very small results. Obviously that query does not happen in real life
but it did break the system.
> If some user thought of just inputting random words then the system will be brought to
its knees and eventually die.
> Essentially, all our lucene index has about 8 fields; 1 field is being used as a filter
> the rest are normal fields which can accept wildcards.
> You have a point in Filters being useful for a few other fields we do have. I'll apply
> So that leaves about 5 fields that allows fuzzy search.
> Which goes back to the max clause problem. Lucene's default Max Clause is 1024, is there
any reason behind this max?
> Thanks, 
> M
> ________________________________
> From: Erick Erickson <>
> To:
> Sent: Thursday, April 2, 2009 2:34:47 PM
> Subject: Re: Search using MultiSearcher generates OOM on a 1GB total  Partitioned indeces
> Ah, I get it now. Given that you bumped your max clause up, it makes
> sense. I'm pretty sure that the wildcard expansion is the root or your
> memory problems. The folks on the list helped me out a lot understanding
> what wildcards were about, see the thread titled "I just don't get wildcards
> at all" in the searchable archives from several years ago...
> Why do you want to generate queries of the form you showed? I'm
> wondering if this is an XY problem and if you gave us a higher level
> description of the problem you're trying to solve we'd be able to
> suggest other approaches. I have a really hard time imagining a use
> case where a user is well served by a clause that says
> "any document that has word beginning with g and h and d and s.....",
> so I'm assuming you're trying to solve something specific to your
> domain.....
> But if you really, truly do require this form, consider Filters. If your
> problem really requires single-letter starts, consider creating 26
> Filters at start up time and use those (see ConstantScoreQuery)
> That'll chew up about 1.5M each of memory, faaaaar less than
> you're consuming presently and will be blazingly fast. If you're
> not limited to single-characters, *still* consider filters. They'll
> consume little memory and are quite speedy to construct.
> Best
> Erick
> On Thu, Apr 2, 2009 at 5:04 AM, Lebiram <> wrote:
>> Hi Erick,
>> I did a search just as JVM started... so I'm thinking that the JVM is busy
>> with some startup stuff... and that this search required more memory than
>> what is available at that time.
>> Had I done this search a while after the JVM has started, then this query
>> succeeds.
>> I then pump in several similar queries running on a different thread and it
>> takes a long time but still runs to completion until one of them generates
>> OOM.But still, queries like this is just using too much memory.
>> As for clauses, the BooleanQuery was set to max clause of... 9,000,000
>> I'm guessing that might have caused the usage of too much memory?
>> I'll try the explain on you've suggested.
>> Thanks,
>> M
>> ________________________________
>> From: Erick Erickson <>
>> To:
>> Sent: Wednesday, April 1, 2009 6:51:13 PM
>> Subject: Re: Search using MultiSearcher generates OOM on a 1GB total
>>  Partitioned indeces
>> Think about putting this query in Luke and doing an "explain" for details,
>> but....
>> I'm surprised this is working at all without throwing TooManyClauses
>> errors.
>> Under the covers, Lucene expands your wildcards to all terms in the field
>> that match. For instance, assume your document field has the following:
>> aa
>> ab
>> ac
>> ad
>> ae
>> Now, searching for a* produces a clause like:
>> (aa OR ab OR ac OR ad OR ae) in place of the a*
>> So your query is generating ginormous OR clauses, one that
>> contains every term in your content field starting with 'g'. Another
>> with every term in your content field starting with 'h' etc. So I suspect
>> that your content field doesn't have very many distinct terms in it....
>> As for why it's returning few entries, what does this part of your
>> query return by itself? Since it's anded with your wildcard query,
>> it might be what's limiting your results.
>> ((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9)
>> (+viewers:cpuser9 +viewers:cpuser4))
>> But I'm puzzled, because saying that you're getting OOM errors
>> doesn't square very well with getting *any* results at all, so is
>> there something else going on?
>> Best
>> Erick@MoreQuestionsThanAnswers.
>> On Wed, Apr 1, 2009 at 1:31 PM, Lebiram <> wrote:
>>> Hi All,
>>> I have the following query on a 1GB index with about 12 million docs :
>>> As you can see the terms consist of wildcards...
>>> query.toString()=+(+content:g* +content:h* +content:d* +content:s*
>>> +content:a* +content:w* +content:b* +content:c* +content:m* +content:e*)
>>> +((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9)
>>> (+viewers:cpuser9 +viewers:cpuser4))
>>> The Searcher is a MultiSearcher with 4 IndexSearchers pointing to 4
>>> different Lucene Index.
>>> This search returns very few records, several ten thousand hits.
>>> Java is assigned with 1GB max memory.
>>> Somehow this search eats the entire java heap space and causes OOM.
>>> Looking at jProfiler, i see org.apache.lucene package generating a lot of
>>> objects which I believe is taking all this space.
>>> Can anyone explain the reason why this particular search might take so
>> much
>>> memory?
>>> Is there anything I am doing wrong here?
>>> More importantly, is there anything I can do to improve this?
>>> -M

To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message