lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre GOSSE <>
Subject RE: Re:RE: About highlighter
Date Fri, 18 Mar 2011 08:27:18 GMT
For your field configuration, the TokenStream you get with getAnyTokenStream is built from

What tokenizer do you use for populating your field ? Have you check with luke that your term
vectors are Ok ?

And what version of lucene ? A change was made on this code recently, for another issue (apparently
unrelated, but who knows ?) See


De : Cescy []
Envoyé : vendredi 18 mars 2011 07:32
À : java-user; Pierre GOSSE
Objet : Re:RE: About highlighter

 Yes, I only search the "contents" field. And I can print the whole contents by doc.get("contents")
if there are any keywords in it. And if the number of words is too large, it is cannot highlight
the keywords at end part of the contents, as if highlight have a word limitation.

document.add( new Field( "contens", value, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS
) );


------------------ Original ------------------
From:  "Pierre GOSSE"<>;
Date:  Thu, Mar 17, 2011 04:25 PM
To:  ""<>;
Subject:  RE: About highlighter

500 is the max size of text fragments to be returned by highlight. It shouldn't be the problem
here, as far as I understand highlight.

Gong li, how is defined the field "contents" ? Is it the only field on which the search is
made ?


-----Message d'origine-----
De : Ian Lea []
Envoyé : mercredi 16 mars 2011 22:29
�� :
Objet : Re: About highlighter

I know nothing about highlighting but that 500 looks like a good place
to start investigating.


On Tue, Mar 15, 2011 at 8:47 PM, Cescy <> wrote:
> Hi,
> My highlight code is shown as following:
>  QueryScorer scorer = new QueryScorer(query);
>  Highlighter highlighter = new Highlighter(simpleHTMLFormatter, scorer);
>  highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer, 500));
>  String contents = doc.get("contents");
>  TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
topDocs.scoreDocs[i].doc, "contents", doc, analyzer);
>  String[] snippet = highlighter.getBestFragments(tokenStream, contents, 10);
> snippet is the result contexts and then I will print out them on the screen.
> But If I may search for a keyword at the last few paragraph and the essay is too long
(1000-2000 words), it will return "document found" and snippet..length=0 (i.e. document is
found but context is NOT found). Why???
> How could I fix the problem?

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message