lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <>
Subject RE: WilcardQuery and memory
Date Fri, 09 Mar 2007 12:43:37 GMT
For indexing e-mail, I recommend that you tokenise the e-mail addresses into
fragments and query on the fragments as whole terms rather than using

Rather than looking for fischauto333* in (say) smtp-from, look for
fischauto333 in (say) an additional field called smtp-from-fragments to
match the term fischauto333 (it also contains the terms and yahoo
and de).

Having whole e-mail addresses in terms and using prefix/wildcard queries
inevitably results in too many clauses.

-----Original Message-----
From: Joe [] 
Sent: 09 March 2007 12:08
Subject: WilcardQuery and memory


Here we use lucene to index our emails, currently 500.000 Documents.
When Searching the body by a WildcardQuery the problems arises.

I did some profiling with JProfiler. I see the more BooleanClause 
instances used
the more memory is required during search.
Most memory is used by instances of classes SegementTermDocs and 
Nearly 30 MB/1000 BooleanClause instances.

So when i limit the BooleanClauses by BooleanQuery.setMaxClauseCount(5000);
and more and more emails gets indexed, will this lead to a OutOfMemoryError
or will the 30MB/1000 Clauses ratio still hold?

What should i do to prevent OOM without reducing 
BooleanQuery.setMaxClauseCount() to much?

Der frühe Vogel fängt den Wurm. Hier gelangen Sie zum neuen Yahoo! Mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message