lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <>
Subject Highlighter: new support for encoding
Date Sun, 06 Feb 2005 22:11:36 GMT
Nicko Cadell was good enough to point out the issues involved with 
generating XHTML compliant markup with the highlighter and provided a 
patch to fix it.

The main code has now been updated in the new SVN repository here:

To encode your content simply pass an encoder to the Highlighter eg:

         //create an example doc for this test 
        String myDocContent = "\"Smith & sons' prices < 3 and >4\" 
claims article";       
        //Ordinarily you'd get the doc content like this..

       //create a query - you'd normally get this from QueryParser.parse
        Query myDocQuery=new TermQuery(new Term("contents","prices"));

        //Create a highlighter and pass a QueryScorer to provide the 
list of query tokens 
        Highlighter highlighter = new Highlighter(new 
        //set the choice of encoder to our simple encoder - otherwise 
default is no encoding
        highlighter.setEncoder(new SimpleHTMLEncoder());
        //Tokenize the document content to get the positions using an 
        Analyzer analyzer=new WhitespaceAnalyzer();
        TokenStream tokenStream = analyzer.tokenStream("contents", new 
        //As a faster alternative to re-analyzing doc content you can
        //use "TokenSources" to take advantage of any pre-tokenized 
content held in any term vectors:
        //Now pass the tokenStream to the highlighter to process
        String encodedSnippet = 
highlighter.getBestFragments(tokenStream, myDocContent,1,"...");
        //Should print &quot;Smith &amp; sons' <B>prices</B> &lt;
3 and 
&gt;4&quot; claims article


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message