lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: HTML pages highlighter
Date Wed, 06 Apr 2005 22:44:41 GMT
What file do those line numbers correspond to?  I'm lost.

Did the Lucene in Action highlighting code work for you?

	Erik

On Apr 6, 2005, at 6:16 PM, Yagnesh Shah wrote:

> Hi! Erik,
> 	Yes basic seems to be working.
> a) My problem is there is a chances that query is not present in 
> stored content of a file so sometimes I am getting empty strings at 
> line#106 so I have to put a special check at line#109 and line#126. I 
> guess this is not a problem. What you think?
> b) When I click on a doc path that was generated by line#120 and 
> line#121 The files that it open do not have a searched query 
> highlighted. Any suggestion for this? How I can do?
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Monday, April 04, 2005 8:45 PM
> To: java-user@lucene.apache.org
> Subject: Re: HTML pages highlighter
>
>
> On Apr 4, 2005, at 5:35 PM, Yagnesh Shah wrote:
>> 	I end up purchasing your book "Lucene in Action". I have downloaded
>> your code samples. I am able to retrieve "result" only some time.
>> Below is the code I have taken from Search.jhtml in lucene demo. I
>> have 2 problem
>>
>> a) I am unable to display "result" using
>> b) When I click on the title to retrieve document I do not see my
>> query highlighted.
>
> First things first.... get something very very simple working and
> expand from there.  Here is the simple code from our HighlightIt.java:
>
>      TermQuery query = new TermQuery(new Term("f", "ipsum"));
>      QueryScorer scorer = new QueryScorer(query);
>      SimpleHTMLFormatter formatter =
>          new SimpleHTMLFormatter("<span class=\"highlight\">",
>              "</span>");
>      Highlighter highlighter = new Highlighter(formatter, scorer);
>      Fragmenter fragmenter = new SimpleFragmenter(50);
>      highlighter.setTextFragmenter(fragmenter);
>
>      TokenStream tokenStream = new StandardAnalyzer()
>          .tokenStream("f", new StringReader(text));
>
>      String result =
>          highlighter.getBestFragments(tokenStream, text, 5, "...");
>
> One trick is that you must ensure the query you are passing to
> QueryScorer has been rewritten.  In our simple TermQuery case, that is
> not necessary, but in a general application it is.  You can call
> query.rewrite(reader) where reader is your IndexReader instance.  This
> ensures that range, fuzzy, and wildcard queries are expanded and
> highlightable.
>
> I'm not sure what is wrong with the code you are trying.  But again,
> start simple, just try out our HighlightIt or our HighlightTest.  If
> those work fine for you then move on to integrating further with your
> index.  Besides the Query.rewrite() trick, you have to be sure that the
> text you want to highlight is available.  If you're pulling it from the
> index, it must be in a stored field, otherwise you need to retrieve it
> from elsewhere.
>
> 	Erik
>
>
>>
>> <java>
>>
>>   Searcher searcher = new IndexSearcher(getReader(indexName));
>>
>>   // get query from request
>>   String queryString = request.getParameter("query");
>>
>>   query = QueryParser.parse(queryString, "contents", analyzer);
>>   Hits hits = searcher.search(query);
>>   SimpleHTMLFormatter formatter =
>>   new SimpleHTMLFormatter();
>>   Highlighter highlighter = new Highlighter(formatter, new
>> QueryScorer(query));
>>   highlighter.setTextFragmenter(new SimpleFragmenter(50));
>>   String FIELD_NAME = "contents";
>>
>>   for (int i = start; i < end; i++) {             // display the hits
>>   Document doc = hits.doc(i);
>>   String text = hits.doc(i).get(FIELD_NAME);
>>   int maxNumFragmentsRequired = 5;
>>   String fragmentSeparator = "...";
>>   if ( text != null){
>> 	TokenStream tokenStream = new
>> StandardAnalyzer().tokenStream(FIELD_NAME, new
>> java.io.StringReader(text));
>> 	String result =
>> highlighter.getBestFragments		
>> (tokenStream,text,maxNumFragmentsRequired,fragmentSeparator);
>>     	System.out.println("result=" +result);
>>   }
>>
>>     String title = doc.get("title");
>>     if (title.equals(""))                         // use url for docs
>> w/o title
>>       title = doc.get("path");
>>     </java>
>>     <p><b><java type=print>(int)(hits.score(i) * 100.0f)</java>%
>>     <a href="`doc.get("path")`">
>>     <java type=print>Entities.encode(title)</java>
>>     </b></a>
>>     <java>
>>     if (showSummaries) {                          // maybe show 
>> summary
>>     </java>
>>     <ul><i>Summary</i>:
>>       <java type=print>Entities.encode(doc.get("summary"))</java>
>>     </ul>
>>     <java>
>>     }
>>   }
>> </java>
>>
>>
>>
>> -----Original Message-----
>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>> Sent: Thursday, March 31, 2005 8:04 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: HTML pages highlighter
>>
>>
>>
>> On Mar 31, 2005, at 6:36 PM, Yagnesh Shah wrote:
>>>     try {
>>>       fis = new FileInputStream(f);
>>>       HTMLParser parser = new HTMLParser(fis);
>>>
>>>       // Add the tag-stripped contents as a Reader-valued Text field
>>> so it will
>>>       // get tokenized and indexed.
>>> //      doc.add(new Field("contents", parser.getReader()));
>>>       LineNumberReader reader = new
>>> LineNumberReader(parser.getReader());
>>>       for (String l = reader.readLine(); l != null; l =
>>> reader.readLine())
>>> //        System.out.println(l);
>>>       doc.add(Field.Text("contents", l));
>>
>> Notice that your loop here is adding a "contents" field for *every*
>> line read since that is where the first semi-colon is.
>>
>> Look at using Luke to explore your index.  Try indexing just a dummy
>> String:
>>
>> 	doc.add(Field.Text("contents", "some dummy text"));
>>
>> to show that it works.  Always always always simplify a complicated
>> situation by doing the most obvious thing that _should_ work.
>>
>> Also, the demo Lucene code is not really designed to be used in a
>> production application (sadly), so you're better off borrowing code
>> from the many articles or our book to begin with.
>>
>> 	Erik
>>
>>
>>>
>>>       // Add the summary as a field that is stored and returned with
>>>       // hit documents for display.
>>>       doc.add(new Field("summary", parser.getSummary(),
>>> Field.Store.YES, Field.Index.NO));
>>>
>>>       // Add the title as a field that it can be searched and that is
>>> stored.
>>>       doc.add(new Field("title", parser.getTitle(), Field.Store.YES,
>>> Field.Index.TOKENIZED));
>>>     }
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>>> Sent: Wednesday, March 30, 2005 7:38 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: HTML pages highlighter
>>>
>>>
>>>
>>> On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:
>>>
>>>> Hi! Eric,
>>>
>>> Erik - with a 'k' - Sorry, I let it slide once though :)
>>>
>>>> 	I try to modified that with this but I get compile error. Do you
>>>> have
>>>> any code snippet of highlighting code to pull the contents from the
>>>> original source?
>>>
>>> I have a whole book full of code examples :)
>>> http://www.lucenebook.com - Grab the source code and look in
>>> src/lia/tools at Highlight*.java
>>>
>>>>  or Do you know how I can do field store?
>>>>
>>>>       doc.add(new Field("contents", parser.getReader(),
>>>> Field.Store.YES, Field.Index.NO));
>>>
>>> You cannot store it with a Reader.  You need to use 
>>> Field.Text(String,
>>> String), or one of the other variations.
>>>
>>> 	Erik
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message