On Oct 13, 2005, at 7:36 AM, Mikko Noromaa wrote:
> Hi,
>
>
>> It would be possible to do a PatternQuery("*") that would
>> enumerate every term.
>>
>
> Does this work differently than the current logic where wildcard
> queries are
> constructed as BooleanQueries with many terms OR'ed together? I
> think this
> would be a good change.
No - it works identically to WildcardQuery, with the only difference
being how it matches. The added bonus though is that there is a
SpanPatternQuery to go along with this, allowing for "foo bar*"
phrase queries.
> I have always thought that it is quite cumbersome to expand
> wildcards to
> many boolean clauses. I think that keeping the wildcard (or regex
> in this
> case) in the query object would be much better. On the other hand,
> it might
> not make any difference in performance, since Lucene would still
> have to go
> through all the terms. But at least it would avoid the
> BooleanQuery$TooManyClauses exception even with thousands of different
> terms. Right?
At this point, the possibility of that exception still exists so
increasing the maximum number of clauses is necessary to avoid it.
> I know I can increase the limit of the boolean queries, but there
> is still a
> limit. In my application, I index Finnish text which has lots of
> different
> suffixes for the same word. With compound words included, I could
> easily
> imagine that the same base word may have hundreds or thousands of
> terms in
> the index.
Hundreds is still under the 1024 built-in restriction for
BooleanQuery. Thousands is do-able by increasing the limit and
having sufficient RAM.
For suffix-wildcards, there really is no difference between my
PatternQuery and WildcardQuery - WildcardQuery may even be faster if
it's matching is quicker than regex (though tests would need to be
performed to confirm, I'd guess that the performance difference isn't
all that much).
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
|