lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ned Regina <nrl...@ergito.com>
Subject Locating term in search results question
Date Fri, 19 Jul 2002 13:54:49 GMT
I need to locate a term in the text field of a document returned in a
search result.  I'm using regular expressions, but they're not always
accurate, and Lucene doesn't seem to index positional information.
Optimally, I could use the same algorithm that matches documents in an
index, but I don't know how to go about doing that.  Also, I'm
concerned that searching the search results could get horribly
processor-intensive.

For Example:

I've got a document with a field containing the following text:

"A chicken makes a lousy house pet."

When a user runs a search for "chicken", I'd like to be able to
accurately locate it within the results in order to highlight it.
This is a simple example that could easily be handled by a regular
expression.

However, if I've got the following text:

"Cytotoxic T cells (also known as killer T cells) possess the capacity
to lyse an infected target cell."

It becomes more difficult to accurately locate the term which caused
the match if the search text was "T cell".  The regular expressions
begin to get more and more complicated with a higher probability of
inaccuracy.  In this case, you would have to be sure not to match "t
cell" in "target cell".

If Lucene had a facility to determine the position of a term in the
text, it would be much easier to highlight.  Any suggestions would be
great.  Thanks.


Ned Regina
www.ergito.com


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message