lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanyi <>
Subject RE: Bug in the BooleanQuery optimizer? ..TooManyClauses
Date Thu, 11 Nov 2004 19:57:32 GMT
Yes, I understand all of this, but I don't want to set it to MaxInt, since it can easily lead
(even accidental) DoS attacks.

What I'm saying is that there is no reason for the optimizer to expand wild* to more than
variations when I search for "somerareword AND wild*", since somerareword is only present
in let's
say 100 documents, so wild* should only expand to words beginning with "wild" in those 100
documents, then it should work fine with the default 1024 clause limit.

But it doesn't, so I can choose between unuseable queries or accidental DoS attacks.

--- Will Allen <> wrote:

> Any wildcard search will automatically expand your query to the number of terms it find
in the
> index that suit the wildcard.
> For example:
> wild*, would become wild OR wilderness OR wildman etc for each of the terms that exist
in your
> index.
> It is because of this, that you quickly reach the 1024 limit of clauses.  I automatically
set it
> to max int with the following line:
> BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE );
> -----Original Message-----
> From: Sanyi []
> Sent: Thursday, November 11, 2004 6:46 AM
> To:
> Subject: Bug in the BooleanQuery optimizer? ..TooManyClauses
> Hi!
> First of all, I've read about BooleanQuery$TooManyClauses, so I know that it has a 1024
> limit by default which is good enough for me, but I still think it works strange.
> Example:
> I have an index with about 20Million documents.
> Let's say that there is about 3000 variants in the entire document set of this word mask:
> Let's say that about 500 documents are containing the word: spectrum
> Now, when I search for "cab* AND spectrum", I don't expect it to throw an exception.
> It should first restrict the search for the 500 documents containing the word "spectrum",
> it
> should collect the variants of "cab*" withing these documents, which turns out in two
or three
> variants of "cab*" (cable, cables, maybe some more) and the search should return let's
say 10
> documents.
> Similar example: When I search for "cab* AND nonexistingword" it still throws a TooManyClauses
> exception instead of saying "No results", since there is no "nonexistingword" in my document
> set,
> so it doesn't even have to start collecting the variations of "cab*".
> Is there any path for this issue?
> Thank you for your time!
> Sanyi
> (I'm using: lucene 1.4.2)
> p.s.: Sorry for re-sending this message, I was first sending it as an accidental reply
to a
> wrong thread..
> __________________________________ 
> Do you Yahoo!? 
> Check out the new Yahoo! Front Page. 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Do you Yahoo!? 
Check out the new Yahoo! Front Page. 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message