Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 60408 invoked from network); 30 Mar 2005 21:14:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 30 Mar 2005 21:14:17 -0000 Received: (qmail 11993 invoked by uid 500); 30 Mar 2005 21:14:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 11950 invoked by uid 500); 30 Mar 2005 21:14:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 11890 invoked by uid 99); 30 Mar 2005 21:14:01 -0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from nyhwwex001a.hwwilson.com (HELO NYHWWEX001.hwwilson.local) (208.238.105.32) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 30 Mar 2005 13:13:59 -0800 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Subject: RE: HTML pages highlighter Date: Wed, 30 Mar 2005 16:13:55 -0500 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: HTML pages highlighter Thread-Index: AcU1a+Ildewq+OxJQ7OWO3GkKf4FDwAAl+SQ From: "Yagnesh Shah" To: X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi! Erik, Here is what I used : cd /opt/dynamo/prod/hww-doc/hww java org.apache.lucene.demo.IndexHTML -create -index help/index help -----Original Message----- From: Erik Hatcher [mailto:erik@ehatchersolutions.com] Sent: Wednesday, March 30, 2005 4:01 PM To: java-user@lucene.apache.org Subject: Re: HTML pages highlighter How did you index "contents"? If you did not use a stored field type,=20 then that is the issue. Erik On Mar 30, 2005, at 12:31 PM, Yagnesh Shah wrote: > Hello Lucene-User, > Is any one try to do highlighting with HTML pages? > > I am trying to do this using demo example by Keld H. Hansen article=20 > "Unweaving a Tangled Web HTMLParser and Lucene" but I am getting=20 > "null" value for text at line #47 Any Idea? > > 1 package org.apache.lucene.search.highlight; > 2 > 3 import java.io.StringReader; > 4 > 5 import org.apache.lucene.analysis.Analyzer; > 6 import org.apache.lucene.analysis.TokenStream; > 7 import org.apache.lucene.analysis.standard.StandardAnalyzer; > 8 import org.apache.lucene.queryParser.QueryParser; > 9 import org.apache.lucene.search.Hits; > 10 import org.apache.lucene.search.IndexSearcher; > 11 import org.apache.lucene.search.Query; > 12 import org.apache.lucene.search.highlight.Formatter; > 13 import org.apache.lucene.search.highlight.Highlighter; > 14 import org.apache.lucene.search.highlight.QueryScorer; > 15 import org.apache.lucene.search.highlight.SimpleFragmenter; > 16 > 17 public class Searcher { > 18 > 19 static Query query; > 20 static Hits hits; > 21 > 22 private static final String FIELD_NAME =3D "contents"; > 23 private static final String indexDir =3D=20 > "/opt/dynamo/prod/hww-doc/hww/help/index"; > 24 > 25 private static Analyzer analyzer =3D new = StandardAnalyzer(); > 26 > 27 public static void main(String[] args) throws Exception { > 28 > 29 IndexSearcher is =3D new IndexSearcher(indexDir); > 30 String searchCriteria =3D "scholarly"; > 31 query =3D QueryParser.parse(searchCriteria, "contents",=20 > analyzer); > 32 > 33 hits =3D is.search(query); > 34 System.out.println("found in: " + query=20 > +"\nhits-length:" +hits.length()); > 35 > 36 doStandardHighlights(); > 37 > 38 is.close(); > 39 } > 40 > 41 static void doStandardHighlights() throws Exception { > 42 Highlighter highlighter =3D new Highlighter(new=20 > MyBolder(), new QueryScorer(query)); > 43 System.out.println("Highlighter: " + highlighter=20 > +"\nhits-length:" +hits.length()); > 44 highlighter.setTextFragmenter(new SimpleFragmenter(20)); > 45 for (int i =3D 0; i < hits.length(); i++) { > 46 System.out.println("URL " + (i + 1) + ": " +=20 > hits.doc(i).getField("path").stringValue()); > 47 String text =3D hits.doc(i).get("FIELD_NAME"); > 48 int maxNumFragmentsRequired =3D 2; > 49 String fragmentSeparator =3D "..."; > 50 TokenStream tokenStream =3D=20 > analyzer.tokenStream(FIELD_NAME, new StringReader(text)); > 51 > 52 String result =3D > 53 highlighter.getBestFragments( > 54 tokenStream, > 55 text, > 56 maxNumFragmentsRequired, > 57 fragmentSeparator); > 58 System.out.println("\tfound in: " + result); > 59 } > 60 } > 61 > 62 private static class MyBolder implements Formatter { > 63 public String highlightTerm(String originalText ,=20 > TokenGroup group) > 64 { > 65 if(group.getTotalScore()<=3D0) > 66 { > 67 return originalText; > 68 } > 69 return "" + originalText + ""; > 70 } > 71 } > 72 > 73 } > > Yagnesh N. Shah > Senior Technology Engineer > CS Dept., 4th Floor > H. W. Wilson > 950 University Avenue, > Bronx NY 10452 > (718) 588 8400 x2721 > http://www.hwwilson.com > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org