lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Mitiaguin <mitiag...@gmail.com>
Subject Re: Multi-color Highlighting: Term problem
Date Wed, 01 Jul 2009 02:54:04 GMT
I guess, you may search for alternative highlighters as contributions for
Java Lucene. I used something 2 years which was faster ( required indexing
with term vectors )  and highlighted phrase searches properly. As far as I
know the most common highlighter doesn't do it right for phrase and any word
from a phrase we searched  for is highlighted. . As for your problem you may
try stemming analyser when indexing but not sure whether it is relevant and
going to help.

On Wed, Jun 24, 2009 at 4:36 PM, Nitin Shiralkar <nitins@coreobjects.com>wrote:

> Hi All,
>
> We are trying to implement multi-color highlighting in our Lucene.NET
> (v2.0) based search engine. We are using "Lucene.Net.Highlight" library for
> the same. Since we do not have any support for multi-color highlighting, we
> are doing that indirectly by extracting each term in search query and
> highlighting it individually with separate formatter.
>
> For example:
>
> String strQuery = "merger agree*" (without quotes)
> ---
> WeightedTerm[] terms = QueryTermExtractor.GetTerms(strQuery, false);
> ---
> ---
> SimpleHTMLFormatter formatter = new
> SimpleHTMLFormatter(_strFormatterStartTag[nFormatter], _strFormatterEndTag);
> --- loop to traverse each term ---
> WeightedTerm term = terms[nTerm];
> ---
> TermQuery termQuery = new TermQuery (new Term (FIELDNAME, term.GetTerm()));
> ---
> Highlighter highlighterContent = ---
>
> Problem:
>
> Above implementation is working fine. However all variations of "agree*"
> query term like "agreements", "agreed", "agreement" are being highlighted in
> separate color. I am not able to correlate all these variations to same
> original term "agree*" to highlight them in same color.
>
> Can anyone suggest me an alternate approach?
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message