lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject wildcards and spans
Date Wed, 02 Aug 2006 15:29:54 GMT
I'm back, with another flavor of wildcards. What direction would you point a
poor boy who's project lead wants wildcard queries and spans? Here's the
problem....

I cannot use any of the classes that throw a "TooManyClauses" exception (e.g.
SpanRegexQuery or SpanNearQuery with, say WildCardQuery). The corpus is big
enough that this is guaranteed to be thrown. So, currently I'm using a
filter for wildcard queries, populating it via WildcardTermEnum and
TermDocs... Works like a champ. But I don't see how to combine this with
spans...

It seems to me that spans are incompatible with filters, they're just
different beasts. I see no way incorporate spans and filters without doing
actual work myself. So, it seems I'm left with several alternatives.

1> figure it out when creating the filter. Conceptually, for each document
find the offsets of the terms I want to span, and find out if the distance
between them fits my criteria and only add the doc to the filter if the
distance is within my parameters.

2> Look at the docs returned by the current filtered process and, for each
doc returned,
  a> don't add if it doesn't fit my span criteria by examining the term
positions.
  b> re-query with a wildcard span, restricted by doc ID. I *think* that by
restricting the query by (lucene) doc_id I'll be able to avoid the "too many
clauses" issue. Assuming that I remember correctly and that the
most-restrictive clause is honored when trying this....

guys, feel free to hop in here with just the names of the classes I really
want to pay attention to <G>....

I know this is scanty info, what I'm looking for is a very quick pointer....
What I'm especially looking for is "Just use the contrib/JustWhatYouWanted
class" <G> although I poked around and didn't see anything...

Thanks
Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message