lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Sentence and Paragraph searching
Date Fri, 01 Jul 2005 20:06:35 GMT
On Friday 01 July 2005 20:52, McCallie,David wrote:
> Couldn't you use SpanQuery for something like this?  Put special
> <start-of-sentence> and <end-of-sentence> tokens around each sentence,
> and then search for the specific key words inside of the outer SPAN? Do
> the same for paragraphs, sections, etc.
> I tried this once, and it seemed to work.  I'm not sure of the
> performance penalty of the SPAN overhead.

It should work, as well as SpanNotQuery for excluding the
sentence boundary (see my other post). Using a separate
sentence field in which each token position is mapped to the same
sentence number would be faster, but that would also require
a special version of PhraseQuery to search at the same position.
Paragraphs can be handled similarly.

The disadvantage of adding a new field over the same data
is that the term index is duplicated.
This could be avoided by extending the index format
with index levels: one for normal use, one for sentences, one for
paragraphs, ... .

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message