lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: toomanyclauses exception
Date Wed, 27 Dec 2006 15:53:51 GMT
Also, see the thread on this list titled "I just don't get wildcards at all"
to see an extensive discussion of this issue, as well as wildcards in
general. You might also search the archive for wildcards. The short form is
that any wildcard (including prefix queries) expands under the covers to
create a clause for each possible entry in the index for that field. For
instance, say a field had the following values:

abcd
abck
abt

Searching for ab* would expand to searching for ab, abck and abt under the
covers. When the number of possibilities gets above the default value of
1024, you see a TooManyClauses exception. Expanding the number of clauses
*may* fix you right up, but on any reasonably sized index, you can come up
with a query that'll exceed whatever number you set. Or you'll get to an
unacceptable performance/memory footprint. Imagine your query with things
like a*

Think seriously about how you're going to deal with this. There are several
options:
1> use filters for all your wildcard clauses and create your own
BooleanQuery. Be aware that using filters affects scoring.
2> Assume that any query that throws a TooManyClauses exception (after
you've set a suitable max as Paul suggested) is too broad to be useful and
respond to the user with some polite phrase asking them to refine the query.
3> Look over the SrndQuery classes. I don't fully understand these, but they
certainly behave much differently in this area. Note that SrndQuery limits
wildcards to having at least three non-wildcard characters.
4> Ask whether stemming is a complete or partial solution. Ditto for
Soundex. There's a good chance these won't apply, but they may.
5> <Insert the solution to your specific problem here>

This is a sticky wicket that will probably consume more time than you think
to handle. It's easy for your product manager to claim that "Of course, we
must support arbitrary wildcards", but I'd urge you to seriously ask what
value *arbitrary* wildcards bring to the product. When you start getting
thousands of responses to a query, is it actually valuable to return them to
the user? Or do you give her just as much value (and deliver product sooner)
by telling her up front that she's getting too many responses to be useful?
With this last strategy, you just catch the TooManyClauses exception and
respond with "refine your query".....

Best
Erick


On 12/27/06, Paul Elschot <paul.elschot@xs4all.nl> wrote:
>
> Chris,
>
> On Wednesday 27 December 2006 15:42, Chris Salem wrote:
> > Hi All,
> >
> > I'm getting a 'TooManyClauses' Exception and I'm not sure how to fix
> this.
> Here's a sample query that I'm using:
> >
> > +(+freeform_text:exhibit* +(+freeform_text:dispaly
> +freeform_text:event*)
> +(+freeform_text:sale* +freeform_text:sells +freeform_text:develop*)
> +(+freeform_text:trade +freeform_text:show +freeform_text:trade
> +freeform_text:shows)) +degree_type:5 +position_desired:ftp
> +city:washington~0.5 +state:dc +ncountry:usa +last_modified:[2005-12-26 TO
> 2006-12-26]
> >
> > Here's the exception I'm getting:
> >
> > org.apache.lucene.search.BooleanQuery$TooManyClauses
> >  at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:160)
> >  at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:151)
> >  at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:52)
>
> One of the prefix queries is causing this, possibly event* or sale*.
> Since they seem to be specific enough, increasing the maximum number
> of boolean clauses that can be added to a boolean query appears to be
> the good way to fix this, see BooleanQuery.setMaxClauseCount().
>
> Regards,
> Paul Elschot
>
> >  at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372)
> >  at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372)
> >  at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372)
> >  at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java
> :137)
> >  at org.apache.lucene.search.Query.weight(Query.java:93)
> >  at org.apache.lucene.search.Hits.<init>(Hits.java:41)
> >  at org.apache.lucene.search.Searcher.search(Searcher.java:44)
> >  at org.apache.lucene.search.Searcher.search(Searcher.java:36)
> >  at
> net.mainsequence.pcr.lucene.LuceneHandler.multiSearch(LuceneHandler.java
> :382)
> >  at
> net.mainsequence.pcr.lucene.LuceneServlet.searchIndex(LuceneServlet.java
> :169)
> >  at
> net.mainsequence.pcr.lucene.LuceneServlet.processRequest(
> LuceneServlet.java:83)
> >  at net.mainsequence.pcr.lucene.LuceneServlet.doPost(LuceneServlet.java
> :72)
> >  at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
> >  at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> >  at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
> ApplicationFilterChain.java:252)
> >  at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:173)
> >  at
> org.apache.catalina.core.StandardWrapperValve.invoke(
> StandardWrapperValve.java:213)
> >  at
> org.apache.catalina.core.StandardContextValve.invoke(
> StandardContextValve.java:178)
> >  at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
> :126)
> >  at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
> :105)
> >  at
> org.apache.catalina.core.StandardEngineValve.invoke(
> StandardEngineValve.java:107)
> >  at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :148)
> >  at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
> >  at
>
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
> (Http11BaseProtocol.java:664)
> >  at
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
> PoolTcpEndpoint.java:527)
> >  at
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
> LeaderFollowerWorkerThread.java:80)
> >  at
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
> ThreadPool.java:684)
> >  at java.lang.Thread.run(Unknown Source)
> >
> > Is there anyway to increase the amount of clauses lucene can take?  This
> kind of large query is not uncommon so any help would be greatly
> appreciated.
> >
> >
> > Chris Salem
> > 440.946.5214 x5458
> > chris@mainsequence.net
> >
> > (The following links were included with this email:)
> > mailto:chris@mainsequence.net
> >
> >
> >
> > (The following links were included with this email:)
> > mailto:chris@mainsequence.net
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message