lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "johan duflost" <>
Subject lucene highlighter
Date Thu, 23 Jun 2005 07:49:41 GMT

Dear list,

I try to use the Term Highlighter in my webapp but I have a problem. I want 
to highlight the terms in a text without extracting the most relevant 
The highlighting works but the last characters are trimmed !

Here is a portion of my code :

  Analyzer analyzer = new StandardAnalyzer();
  Query query = null;
  try {
   query = QueryParser.parse(queryStr, "scientificName", analyzer);
   query = query.rewrite("E:/specimenset-index"));
  } catch (ParseException e) {
   // TODO Auto-generated catch block
  } catch (IOException e) {
   // TODO Auto-generated catch block

  QueryScorer scorer = new QueryScorer(query);
  SimpleHTMLFormatter formatter = new SimpleHTMLFormatter(
    "<span class=\"highlight\">", "</span>");

  Highlighter highlighter = new Highlighter(formatter, scorer);

  TokenStream tokenStream = analyzer.tokenStream("scientificName",
    new StringReader(text));

  String highlightedText = null;

  try {
   highlightedText = highlighter.getBestFragment(
    tokenStream, text);
  } catch (IOException e1) {
   // TODO Auto-generated catch block
  return highlightedText ;

A value for text variable is for instance :
    <a href='taxoninfo.html?id=112'><span class='genus-species'>Capparimyia 
savastani</span> (Martelli)</a>

The corresponding value for highlightedText variable is :
    <a href='taxoninfo.html?id=112'><span class='genus-species'><span 
class="highlight">Capparimyia</span> savastani</span> (Martelli

The ")</a>" are trimmed for some mysterious reason !! I try to play with 
Encoder and Fragmenter classes but without success !

Any help would be appreciate.

Best regards,


Johan Duflost
Belgian Biodiversity Information Facility (BeBIF)
Universite Libre de Bruxelles

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message