A thought - would you (or the project lead;-) consider limiting the 'wildcard expansion'? Assuming a query like: ( uni* near(5) science ) I.e. match docs with any word with prefix "uni" that spans no further than 5 from the word "science". Assume current lexicon has M (say 1200) words starting with "uni", you could (manually) select the first N (say 50) words from this list. You could also select the 'best' N words - defining 'best' in correlation with those words DF for instance. This has a few issues: - what N value is good enough? - how to select the N words? - you would have to interfere with the code that expands the wildcard. - search results recall may degrade because it might happen that words selected as 'best' would not pass the 'span test' while some of the words that were not selected would have passed the span test. But it might be practical, and perhaps, hopefully, satisfactory. Regards, Doron "Erick Erickson" wrote on 02/08/2006 11:17:19: > I'm almost entirely certain that any value I choose for setMaxClauseCount is > going to be wrong, but I might give it a try. > > Erick > > On 8/2/06, Paul Elschot wrote: > > > > On Wednesday 02 August 2006 17:29, Erick Erickson wrote: > > > I'm back, with another flavor of wildcards. What direction would you > > point a > > > poor boy who's project lead wants wildcard queries and spans? Here's the > > > problem.... > > > > > > I cannot use any of the classes that throw a "TooManyClauses" exception > > (e.g. > > > SpanRegexQuery or SpanNearQuery with, say WildCardQuery). The corpus is > > big > > > enough that this is guaranteed to be thrown. So, currently I'm using a > > > filter for wildcard queries, populating it via WildcardTermEnum and > > > TermDocs... Works like a champ. But I don't see how to combine this with > > > spans... > > > > You can try BooleanQuery.setMaxClauseCount() to increase the max. nr. of > > clauses to 100000 or so and see what happens when searching. > > With enough RAM it should work nicely. > > > > You could also use the surround query language. This allows to set > > the max. nr. of clauses for a whole query instead of per BooleanQuery. > > > > Regards, > > Paul Elschot > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org