lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: wildcards and spans
Date Wed, 02 Aug 2006 15:43:09 GMT
Well, I can't do 2 <b> TooManyClauses again. Should have realized that the
terms are assembled independently....

Erick

On 8/2/06, Erick Erickson <erickerickson@gmail.com> wrote:
>
> I'm back, with another flavor of wildcards. What direction would you point
> a poor boy who's project lead wants wildcard queries and spans? Here's the
> problem....
>
> I cannot use any of the classes that throw a "TooManyClauses" exception (
> e.g. SpanRegexQuery or SpanNearQuery with, say WildCardQuery). The corpus
> is big enough that this is guaranteed to be thrown. So, currently I'm using
> a filter for wildcard queries, populating it via WildcardTermEnum and
> TermDocs... Works like a champ. But I don't see how to combine this with
> spans...
>
> It seems to me that spans are incompatible with filters, they're just
> different beasts. I see no way incorporate spans and filters without doing
> actual work myself. So, it seems I'm left with several alternatives.
>
> 1> figure it out when creating the filter. Conceptually, for each document
> find the offsets of the terms I want to span, and find out if the distance
> between them fits my criteria and only add the doc to the filter if the
> distance is within my parameters.
>
> 2> Look at the docs returned by the current filtered process and, for each
> doc returned,
>   a> don't add if it doesn't fit my span criteria by examining the term
> positions.
>   b> re-query with a wildcard span, restricted by doc ID. I *think* that
> by restricting the query by (lucene) doc_id I'll be able to avoid the "too
> many clauses" issue. Assuming that I remember correctly and that the
> most-restrictive clause is honored when trying this....
>
> guys, feel free to hop in here with just the names of the classes I really
> want to pay attention to <G>....
>
> I know this is scanty info, what I'm looking for is a very quick
> pointer.... What I'm especially looking for is "Just use the
> contrib/JustWhatYouWanted class" <G> although I poked around and didn't see
> anything...
>
> Thanks
> Erick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message