lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KK <>
Subject Re: hit highlighting in lucene ?
Date Thu, 21 May 2009 13:01:44 GMT
Initially I was using standardAnalyzer but I switched to simpleAnalyzer
which I guess doesnot do more that tokenizing[and may be tokenizing] and I
think this does not do stemming which I dont/cant do because I've no stemmer
for the languages I'm indexing.
For indexing and querring I'm using the same SimpelAnalyzer. So as you say I
can go for the standard highlighter api which I mentioned in my last mail,
and this will handle any language for highlighting support. I should start
using this one, right?

One more thing. I've a single indexer and searcher that I'm usign for
indexing pages of many different non-english languages and as I mentioned
earier I'm using simpleAnalyzer, does that mean If I index english pages
with the same indexer, it will not take care of stemming and stop word
removal? But I dont want to have multiple indexer that is specific to
languages. Cant we have a single indexer that handles non-eng and eng in
equally good ways? Or any other ideas on the same ?


On Thu, May 21, 2009 at 6:18 PM, Joel Halbert <> wrote:

> The highlighter should be language independent. So long as you are
> consistent with your use of Analyzer between
> indexing/query/highlighting.
> As for the most appropriate Analyzer to use for your local language,
> this is a seperate question - especially if you are using stop word and
> stemming filters.
> The StandardAnalyzer is designed for English since it used the
> StopFilter (English words only).
> -----Original Message-----
> From: KK <>
> Reply-To:
> To:
> Subject: hit highlighting in lucene ?
> Date: Thu, 21 May 2009 17:51:13 +0530
> Hi All,
> I was looking for various ways of implementing hit highlighting in Lucene
> and found some standard classes that does support highlighting like this
> *lucene**lucene*/search/*highlight*
> /package-summary.html<*lucene*/search/*highlight*%0A/package-summary.html>
> ik but what i believe is that this is only for english or does it support
> other languages. I actually wanted to support highlighting for some
> non-english languages which I'm able to index and fetch using utf-8
> encoding. So  this means that if I want to have highlighting then I've to
> get the utf-8 query and look for the same in the result and add apt tags
> whereever required, it essentially boils down to implementing the standard
> highlighter. I think the standard highlighter also supports other
> languages.
> Correct me if i'm wrong.
> Due to my requirement constraints I'm using just simpleAnalyzer and we dont
> have tokenizers for these regional languages. Any other ideas of doing the
> same would be helpful as well.
> Thanks,
> KK.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message