lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Itamar Syn-Hershko" <ita...@divrei-tora.com>
Subject RE: Contrib Highlighter and Phrase search
Date Wed, 19 Mar 2008 07:28:15 GMT

I'm not sure how the current Highlighter works - haven't had the time to
look into it yet - but I thought about the following implementation. Judging
by your question, this works in a slightly different way than the current
Highlighter:

1. Build a Radix tree (PATRICIA) and populate it with all search terms.
Phrase queries will be considered as one big string, regardless their
spaces.

2. Iterate through your text ignoring spaces and punctuation marks, and for
each word start a Radix lookup by letter, e.g. for the word John you will
initialize a "tavel" upon the tree with j, then o, h, and so on, so if "John
Kennedy" was your term it will not ignore the space (I'd say you'd want to
keep only one instance of it and ignore the rest). On a dead end (no
relevant branch on the tree) just skip to the next space or punctuation mark
and restart the process.

3. If an iteration has touched base - completed a lookup successfully, the
MyHighlighter object will boldify or surround the text with a span. It is
then possible to give the span different IDs based on the term (each term a
different ID) so each term will be highlighted with a different color.

This allows for fast and exact highlighting of large texts as well as
smaller ones. I would love to hear any comments on the above.

Itamar.

-----Original Message-----
From: Mark Miller [mailto:markrmiller@gmail.com] 
Sent: Tuesday, March 18, 2008 10:51 PM
To: java-user@lucene.apache.org
Subject: Re: Contrib Highlighter and Phrase search

The contrib Highlighter is not position sensitive. You can try out the patch
I have been working here if you are interested: 
https://issues.apache.org/jira/browse/LUCENE-794

Spencer Tickner wrote:
> Hi List,
>
> Thanks in advance for any help. I'm working with the contrib 
> highlighting class and am having issues when doing searches with a 
> phrase. I've been able to duplicate this behaviour in the 
> HighlighterTest class.
>
> When calling the testGetBestFragmentsPhrase() method I get the correct:
>
> <b>John</b>  <b>Kennedy</b>  has been shot
>
> However when I add "John" to the text so it reads:
>
> "John Kennedy has been John shot"
>
> I get:
>
> <b>John</b>  <b>Kennedy</b>  has been<b>John</b>
 shot
>
> It makes sense to me that John should not be highlighted. Has anyone 
> else run into this problem? Anyone have a fix?
>
> Cheers,
>
> Spencer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>    

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message