lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com>
Subject Re: Contrib Highlighter and Phrase search
Date Wed, 19 Mar 2008 22:44:06 GMT
On Wednesday 19 March 2008 18:28:15 Itamar Syn-Hershko wrote:
> 1. Build a Radix tree (PATRICIA) and populate it with all search terms.
> Phrase queries will be considered as one big string, regardless their
> spaces.
>
> 2. Iterate through your text ignoring spaces and punctuation marks, and for
> each word start a Radix lookup by letter, e.g. for the word John you will
> initialize a "tavel" upon the tree with j, then o, h, and so on, so if
> "John Kennedy" was your term it will not ignore the space (I'd say you'd
> want to keep only one instance of it and ignore the rest). On a dead end
> (no relevant branch on the tree) just skip to the next space or punctuation
> mark and restart the process.
>
> 3. If an iteration has touched base - completed a lookup successfully, the
> MyHighlighter object will boldify or surround the text with a span. It is
> then possible to give the span different IDs based on the term (each term a
> different ID) so each term will be highlighted with a different color.
>
> This allows for fast and exact highlighting of large texts as well as
> smaller ones. I would love to hear any comments on the above.

I guess this would work, on the assumption that no analysis was performed on 
the text.  Otherwise stemming would break it instantly.

But why not perform the same logic but with terms instead of letters?  Or 
would that be what the current highlighter is already doing?

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message