lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Stern <yibing...@yahoo.com.cn>
Subject Re: Highlight the searched word when full-text searching performed
Date Mon, 28 Nov 2005 07:45:22 GMT
Thanks!
  Yes you're right. Highlight all hits at one time may cause problems. A hits paging method
is needed to avoid this.
  Another, if we read the contents of the original file into a string, passing it to the highlighter
at the searching stage, this also could cause problems when large original file met. I think
we can use a random access method to reduce the string size by locating the searched word,
and this may be a general problem needing to be solved.
   
  Stern

Erik Hatcher <erik@ehatchersolutions.com> wrote:
  
On 27 Nov 2005, at 00:24, Jerry Stern wrote:
> I wonder how to highlight the searched word when full-text 
> searching performed based on Lucene.
> At the indexing stage, the contents of a original file is 
> regarded as a FIELD of a Lucene document:
> private static void indexFile(File file, IndexWriter idxWriter)
> throws IOException {
> System.out.print("Indexing " + file.getCanonicalPath() + " ......");
>
> Document doc = new Document();
> doc.add(Field.Text("path", file.getAbsolutePath()));
> doc.add(Field.Text("contents", new FileReader(file)));
>
> idxWriter.addDocument(doc);
>
> System.out.println("indexed.");
> }
>
> At the searching stage:
> Highlighter highlighter = new Highlighter(new QueryScorer(query));
> for (int i = 0; i < hits.length(); i++)
> {
> String text = hits.doc(i).get("contents"); // the text = null.
>
> TokenStream tokenStream = analyzer.tokenStream("path",
> new StringReader(text));
> // Get 3 best fragments and seperate with a "..."
> String result = highlighter.getBestFragments(tokenStream,
> text, 3, "...");
> System.out.println(result);
> }
>
> The 'contents' field is not stored in index file, and it is not 
> reasonable to store it in index file. So the red line of code can 
> not get the 'contents' field from the index file.
> I think that the 'text' parameter for the 
> Highlighter.getBestFragments(..) method must be the context string 
> of the searched word. So my question is how can I get the context 
> string of the searched word?

In your case, you'll need to get the "path" field (since that is 
being stored) and then load the file into a String to pass to the 
highlighter. The text to highlight must be stored somewhere, and in 
your case it is on the filesystem only.

Be carefuly - you're highlighting all hits there, which will have 
issues if you get a lot of hits.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

  


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message