lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <markharw...@yahoo.co.uk>
Subject Highlighter: new support for encoding
Date Sun, 06 Feb 2005 22:11:36 GMT
Nicko Cadell was good enough to point out the issues involved with 
generating XHTML compliant markup with the highlighter and provided a 
patch to fix it.

The main code has now been updated in the new SVN repository here: 
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/

To encode your content simply pass an encoder to the Highlighter eg:


         //create an example doc for this test 
        String myDocContent = "\"Smith & sons' prices < 3 and >4\" 
claims article";       
        //Ordinarily you'd get the doc content like this..
        //myDocContent=hits.doc(i).get(FIELD_NAME)

       //create a query - you'd normally get this from QueryParser.parse
        Query myDocQuery=new TermQuery(new Term("contents","prices"));

        //Create a highlighter and pass a QueryScorer to provide the 
list of query tokens 
        Highlighter highlighter = new Highlighter(new 
QueryScorer(myDocQuery));
        //set the choice of encoder to our simple encoder - otherwise 
default is no encoding
        highlighter.setEncoder(new SimpleHTMLEncoder());
       
       
        //Tokenize the document content to get the positions using an 
analyzer:
        Analyzer analyzer=new WhitespaceAnalyzer();
        TokenStream tokenStream = analyzer.tokenStream("contents", new 
StringReader(myDocContent));
       
       
        //As a faster alternative to re-analyzing doc content you can
        //use "TokenSources" to take advantage of any pre-tokenized 
content held in any term vectors:
        //TokenStream 
tokenStream=TokenSources.getAnyTokenStream(indexReader,docId, 
fieldName,analyzer);
       
        //Now pass the tokenStream to the highlighter to process
        String encodedSnippet = 
highlighter.getBestFragments(tokenStream, myDocContent,1,"...");
        System.out.println(encodedSnippet);
        //Should print &quot;Smith &amp; sons' <B>prices</B> &lt;
3 and 
&gt;4&quot; claims article

Cheers
Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message