lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Kor" <dave....@nexusedge.com>
Subject RE: OutOfMemoryErrors searching with WildCardQueries
Date Thu, 12 Jun 2003 07:12:13 GMT
OOM occurs only if the prefix/wild/fuzzy query matches with a large
percentage of the terms in the index due to their use of term expansion.
Placing a limit to expand only the first x matching terms would solve this
problem, that is assuming that the tradeoff in accuracy is acceptable.


Dave Kor Kian Wei
Consultant
Product Engineering
NexusEdge Technologies Pte. Ltd.
6 Aljunied Ave 3, #01-02 (Level 4)
Singapore 389932
Tel : (+65)848-2552
Fax : (+65)747-4536
Web : www.nexusedge.com

> -----Original Message-----
> From: Konrad Kolosowski [mailto:konradk@ca.ibm.com]
> Sent: Thursday, June 12, 2003 9:26 AM
> To: Lucene Users List
> Subject: OutOfMemoryErrors searching with WildCardQueries
>
>
> I need to proof an on-line system against Out Of Memory Errors, that some
> times crash our system.  The system allows boolean searches with wild
> cards.
>
> It is not recommended to use WildCardQuery with wild card at the first
> position.   Having wildcard at first position works for small number of
> documents in the index but results in errors for a larger index
> (containing
> 3k of 1-2 pages docs).  If one types a query with many wild
> cards, close to
> the beginning of terms, e.g.  a* OR b* OR ... OR z*, is not it going to
> lead to the same problem?
>
> If I impose a requirement that not first one but first 3 letters of a word
> in a query cannot be a wild card.  Will it provide an additional
> safety and
> reduce the memory consumption during search?  If it does than I think it
> probably would not help when index contains large number of terms with
> common prefix anyway.
>
> If the index grows to hundred thousand documents, with users
> simultaneously
> searching indexes for different locales, what is the best way to cup the
> memory requirement?  Limiting number of terms, or number of terms
> containing wild cards, or eliminating wild card searches altogether.
>
> Thanks for explanation or any pointers.
>
> Konrad Kolosowski
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message