lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yagnesh Shah" <ys...@hwwilson.com>
Subject RE: HTML pages highlighter
Date Mon, 04 Apr 2005 21:35:41 GMT
Hi! Eric,
	I end up purchasing your book "Lucene in Action". I have downloaded your code samples. I
am able to retrieve "result" only some time. Below is the code I have taken from Search.jhtml
in lucene demo. I have 2 problem

a) I am unable to display "result" using
b) When I click on the title to retrieve document I do not see my query highlighted.

<java>

  Searcher searcher = new IndexSearcher(getReader(indexName));

  // get query from request
  String queryString = request.getParameter("query");

  query = QueryParser.parse(queryString, "contents", analyzer);
  Hits hits = searcher.search(query);
  SimpleHTMLFormatter formatter =
  new SimpleHTMLFormatter();
  Highlighter highlighter = new Highlighter(formatter, new QueryScorer(query));
  highlighter.setTextFragmenter(new SimpleFragmenter(50));
  String FIELD_NAME = "contents";

  for (int i = start; i < end; i++) {             // display the hits
  Document doc = hits.doc(i);
  String text = hits.doc(i).get(FIELD_NAME);
  int maxNumFragmentsRequired = 5;
  String fragmentSeparator = "...";
  if ( text != null){
	TokenStream tokenStream = new StandardAnalyzer().tokenStream(FIELD_NAME, new java.io.StringReader(text));
	String result = highlighter.getBestFragments		(tokenStream,text,maxNumFragmentsRequired,fragmentSeparator);
    	System.out.println("result=" +result);
  }

    String title = doc.get("title");
    if (title.equals(""))                         // use url for docs w/o title
      title = doc.get("path");
    </java>
    <p><b><java type=print>(int)(hits.score(i) * 100.0f)</java>%
    <a href="`doc.get("path")`">
    <java type=print>Entities.encode(title)</java>
    </b></a>
    <java>
    if (showSummaries) {                          // maybe show summary
    </java>
    <ul><i>Summary</i>:
      <java type=print>Entities.encode(doc.get("summary"))</java>
    </ul>
    <java>
    }
  }
</java>



-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Thursday, March 31, 2005 8:04 PM
To: java-user@lucene.apache.org
Subject: Re: HTML pages highlighter



On Mar 31, 2005, at 6:36 PM, Yagnesh Shah wrote:
>     try {
>       fis = new FileInputStream(f);
>       HTMLParser parser = new HTMLParser(fis);
>
>       // Add the tag-stripped contents as a Reader-valued Text field 
> so it will
>       // get tokenized and indexed.
> //      doc.add(new Field("contents", parser.getReader()));
>       LineNumberReader reader = new 
> LineNumberReader(parser.getReader());
>       for (String l = reader.readLine(); l != null; l = 
> reader.readLine())
> //        System.out.println(l);
>       doc.add(Field.Text("contents", l));

Notice that your loop here is adding a "contents" field for *every* 
line read since that is where the first semi-colon is.

Look at using Luke to explore your index.  Try indexing just a dummy 
String:

	doc.add(Field.Text("contents", "some dummy text"));

to show that it works.  Always always always simplify a complicated 
situation by doing the most obvious thing that _should_ work.

Also, the demo Lucene code is not really designed to be used in a 
production application (sadly), so you're better off borrowing code 
from the many articles or our book to begin with.

	Erik


>
>       // Add the summary as a field that is stored and returned with
>       // hit documents for display.
>       doc.add(new Field("summary", parser.getSummary(), 
> Field.Store.YES, Field.Index.NO));
>
>       // Add the title as a field that it can be searched and that is 
> stored.
>       doc.add(new Field("title", parser.getTitle(), Field.Store.YES, 
> Field.Index.TOKENIZED));
>     }
>
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Wednesday, March 30, 2005 7:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: HTML pages highlighter
>
>
>
> On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:
>
>> Hi! Eric,
>
> Erik - with a 'k' - Sorry, I let it slide once though :)
>
>> 	I try to modified that with this but I get compile error. Do you have
>> any code snippet of highlighting code to pull the contents from the
>> original source?
>
> I have a whole book full of code examples :)
> http://www.lucenebook.com - Grab the source code and look in
> src/lia/tools at Highlight*.java
>
>>  or Do you know how I can do field store?
>>
>>       doc.add(new Field("contents", parser.getReader(),
>> Field.Store.YES, Field.Index.NO));
>
> You cannot store it with a Reader.  You need to use Field.Text(String,
> String), or one of the other variations.
>
> 	Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message