Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 71806 invoked from network); 30 Mar 2005 21:46:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 30 Mar 2005 21:46:32 -0000 Received: (qmail 74280 invoked by uid 500); 30 Mar 2005 21:46:29 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 73734 invoked by uid 500); 30 Mar 2005 21:46:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 73708 invoked by uid 99); 30 Mar 2005 21:46:27 -0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from nyhwwex001a.hwwilson.com (HELO NYHWWEX001.hwwilson.local) (208.238.105.32) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 30 Mar 2005 13:46:27 -0800 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Subject: RE: HTML pages highlighter Date: Wed, 30 Mar 2005 16:46:25 -0500 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: HTML pages highlighter Thread-Index: AcU1cHBDczX5qT6CSgq+35RbwNKpFgAAd9eA From: "Yagnesh Shah" To: X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi! Eric, I try to modified that with this but I get compile error. Do you have = any code snippet of highlighting code to pull the contents from the = original source? or Do you know how I can do field store? doc.add(new Field("contents", parser.getReader(), Field.Store.YES, = Field.Index.NO)); -----Original Message----- From: Erik Hatcher [mailto:erik@ehatchersolutions.com] Sent: Wednesday, March 30, 2005 4:35 PM To: java-user@lucene.apache.org Subject: Re: HTML pages highlighter On Mar 30, 2005, at 4:17 PM, Yagnesh Shah wrote: > Hi! Eric, > One more thing, I am using the same HTMLDocument.java that comes with = > /trunk/src/demo/org/apache/lucene/demo Which does this: doc.add(new Field("contents", parser.getReader())); That is not a stored field. In other words, the original "contents"=20 are not available from the Lucene index. You will have to adjust your=20 indexing code to store the contents, or adjust your highlighting code=20 to pull the contents from the original source again. Erik > > -----Original Message----- > From: Erik Hatcher [mailto:erik@ehatchersolutions.com] > Sent: Wednesday, March 30, 2005 4:01 PM > To: java-user@lucene.apache.org > Subject: Re: HTML pages highlighter > > > How did you index "contents"? If you did not use a stored field type, > then that is the issue. > > Erik > > On Mar 30, 2005, at 12:31 PM, Yagnesh Shah wrote: > >> Hello Lucene-User, >> Is any one try to do highlighting with HTML pages? >> >> I am trying to do this using demo example by Keld H. Hansen article >> "Unweaving a Tangled Web HTMLParser and Lucene" but I am getting >> "null" value for text at line #47 Any Idea? >> >> 1 package org.apache.lucene.search.highlight; >> 2 >> 3 import java.io.StringReader; >> 4 >> 5 import org.apache.lucene.analysis.Analyzer; >> 6 import org.apache.lucene.analysis.TokenStream; >> 7 import org.apache.lucene.analysis.standard.StandardAnalyzer; >> 8 import org.apache.lucene.queryParser.QueryParser; >> 9 import org.apache.lucene.search.Hits; >> 10 import org.apache.lucene.search.IndexSearcher; >> 11 import org.apache.lucene.search.Query; >> 12 import org.apache.lucene.search.highlight.Formatter; >> 13 import org.apache.lucene.search.highlight.Highlighter; >> 14 import org.apache.lucene.search.highlight.QueryScorer; >> 15 import org.apache.lucene.search.highlight.SimpleFragmenter; >> 16 >> 17 public class Searcher { >> 18 >> 19 static Query query; >> 20 static Hits hits; >> 21 >> 22 private static final String FIELD_NAME =3D "contents"; >> 23 private static final String indexDir =3D >> "/opt/dynamo/prod/hww-doc/hww/help/index"; >> 24 >> 25 private static Analyzer analyzer =3D new = StandardAnalyzer(); >> 26 >> 27 public static void main(String[] args) throws Exception { >> 28 >> 29 IndexSearcher is =3D new IndexSearcher(indexDir); >> 30 String searchCriteria =3D "scholarly"; >> 31 query =3D QueryParser.parse(searchCriteria, "contents", >> analyzer); >> 32 >> 33 hits =3D is.search(query); >> 34 System.out.println("found in: " + query >> +"\nhits-length:" +hits.length()); >> 35 >> 36 doStandardHighlights(); >> 37 >> 38 is.close(); >> 39 } >> 40 >> 41 static void doStandardHighlights() throws Exception { >> 42 Highlighter highlighter =3D new Highlighter(new >> MyBolder(), new QueryScorer(query)); >> 43 System.out.println("Highlighter: " + highlighter >> +"\nhits-length:" +hits.length()); >> 44 highlighter.setTextFragmenter(new = SimpleFragmenter(20)); >> 45 for (int i =3D 0; i < hits.length(); i++) { >> 46 System.out.println("URL " + (i + 1) + ": " + >> hits.doc(i).getField("path").stringValue()); >> 47 String text =3D hits.doc(i).get("FIELD_NAME"); >> 48 int maxNumFragmentsRequired =3D 2; >> 49 String fragmentSeparator =3D "..."; >> 50 TokenStream tokenStream =3D >> analyzer.tokenStream(FIELD_NAME, new StringReader(text)); >> 51 >> 52 String result =3D >> 53 highlighter.getBestFragments( >> 54 tokenStream, >> 55 text, >> 56 maxNumFragmentsRequired, >> 57 fragmentSeparator); >> 58 System.out.println("\tfound in: " + result); >> 59 } >> 60 } >> 61 >> 62 private static class MyBolder implements Formatter { >> 63 public String highlightTerm(String originalText , >> TokenGroup group) >> 64 { >> 65 if(group.getTotalScore()<=3D0) >> 66 { >> 67 return originalText; >> 68 } >> 69 return "" + originalText + ""; >> 70 } >> 71 } >> 72 >> 73 } >> >> Yagnesh N. Shah >> Senior Technology Engineer >> CS Dept., 4th Floor >> H. W. Wilson >> 950 University Avenue, >> Bronx NY 10452 >> (718) 588 8400 x2721 >> http://www.hwwilson.com >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org