lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Date Tue, 18 Dec 2012 11:39:51 GMT
On Thu, Dec 13, 2012 at 10:09 AM, Glen Newton <glen.newton@gmail.com> wrote:
>>Unfortunately, Lucene doesn't properly index
> spans (it records the start position but not the end position), so
> that limits what kind of matching you can do at search time.
>
> If this could be fixed (i.e. indexing the _end_ of a span) I think all
> the things that I want to do, and the things that can now be done in
> GATE very easily, would be possible using Mike's suggested method.

What would you use the end of the span for?

For example, do you need to do the equivalent of and end-of-span-aware
PhraseQuery?

Ie, so that if the document is "wireless network is down", and I apply
the synonym "wireless network" -> "wifi" at indexing time, then the
end-span-aware-PhraseQuery would match "wifi is down" (unlike today).

If you stuff the end of the span into the payload you'd have to create
a custom variant of PhraseQuery to properly match based on the end
span.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message