lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Sentence and Paragraph searching
Date Fri, 01 Jul 2005 19:27:02 GMT
On Friday 01 July 2005 21:13, Dave Kor wrote:
> Quoting Peter Laurinc <laurinc@felisconsulting.com>:
> 
> > Hi,
> >
> > I'm newbie to lucene.
> > I wan to ask, how to implement search for phrase that must be in
> > sentence/paragraph.
> > I did see som examples, that uses term position changing, but I think
> > that this is not the way, because it breaks classic proximity search.
> > (if one word is on end and second of begining of next sentence)
> 
> Most NLP toolkits have a sentence and paragraph boundary detector that can be
> used to separate a single document into its constituent sentences and
> paragraphs. The two NLP toolkits I am familiar with are OpenNLP (Open Source)
> and Alias-i's LingPipe (Commercial) libraries. These toolkits can be used in
> several ways to achieve what you want.
> 
> If you are ONLY interested in searching for individual sentences, then you can
> use the toolkits to create an index of sentences instead of an index of
> documents.
> 
> Alternatively, you can encode the sentence boundaries found by these toolkits
> within the documents you are indexing, for example using special characters or
> as a separate field in Lucene. After every search, do an extra check to ensure
> that Lucene did not match across sentence boundaries.

Or try and use a SpanNotQuery to make sure that the sentence or paragraph
border is contained in matches in the same field.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message