lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "McCallie,David" <DMCCAL...@cerner.com>
Subject RE: Sentence and Paragraph searching
Date Fri, 01 Jul 2005 18:52:45 GMT

Couldn't you use SpanQuery for something like this?  Put special
<start-of-sentence> and <end-of-sentence> tokens around each sentence,
and then search for the specific key words inside of the outer SPAN? Do
the same for paragraphs, sections, etc.

I tried this once, and it seemed to work.  I'm not sure of the
performance penalty of the SPAN overhead.

--david



-----Original Message-----
From: Peter Laurinc [mailto:laurinc@felisconsulting.com]
Sent: Friday, July 01, 2005 10:46 AM
To: java-user@lucene.apache.org
Subject: RE: Sentence and Paragraph searching

Maybe the solution is have to each term not only position but also
something like vector. Then you can "vectorize it":
term 1 has vector 1, 1 term 2 has vector 1, 1 (1 paragraph, 1 sentence
of this paragraph) , term 3 has (1, 2) if you set query for searching in
paragraph/sentence you only set what portion of vector must be same.

Is this the way?

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Friday, July 01, 2005 4:04 PM
To: java-user@lucene.apache.org
Subject: Re: Sentence and Paragraph searching


On Jul 1, 2005, at 8:16 AM, Peter Laurinc wrote:

> Hi,
>
> I'm newbie to lucene.
> I wan to ask, how to implement search for phrase that must be in
> sentence/paragraph.
> I did see som examples, that uses term position changing, but I think
> that this is not the way, because it breaks classic proximity search.
> (if one word is on end and second of begining of next sentence)

It really depends on your needs.  If you never need proximity across 
sentence boundaries, then what's the issue?   Putting a large gap at 
sentence boundaries makes good sense for some needs.  Maybe not so for
your situation?

I'm definitely interested in what others have done with this sort of
thing.

At the extreme, if all you wanted was to find sentences and did not need
to query for terms in multiple sentences at one time then you could
index each sentence as a separate Document.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


CONFIDENTIALITY NOTICE

This message and any included attachments
are from Cerner Corporation and are intended
only for the addressee. The information
contained in this message is confidential and
may constitute inside or non-public information
under international, federal, or state
securities laws. Unauthorized forwarding,
printing, copying, distribution, or use of such
information is strictly prohibited and may be
unlawful. If you are not the addressee, please
promptly delete this message and notify the
sender of the delivery error by e-mail or you
may call Cerner's corporate offices in Kansas
City, Missouri, U.S.A at (+1) (816)221-1024.
---------------------------------------- --

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message