lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Sentence and Paragraph searching
Date Fri, 01 Jul 2005 20:06:35 GMT
On Friday 01 July 2005 20:52, McCallie,David wrote:
> 
> Couldn't you use SpanQuery for something like this?  Put special
> <start-of-sentence> and <end-of-sentence> tokens around each sentence,
> and then search for the specific key words inside of the outer SPAN? Do
> the same for paragraphs, sections, etc.
> 
> I tried this once, and it seemed to work.  I'm not sure of the
> performance penalty of the SPAN overhead.
> 

It should work, as well as SpanNotQuery for excluding the
sentence boundary (see my other post). Using a separate
sentence field in which each token position is mapped to the same
sentence number would be faster, but that would also require
a special version of PhraseQuery to search at the same position.
Paragraphs can be handled similarly.

The disadvantage of adding a new field over the same data
is that the term index is duplicated.
This could be avoided by extending the index format
with index levels: one for normal use, one for sentences, one for
paragraphs, ... .

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message