lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Document as Paramter (Rephrased)
Date Sat, 12 Nov 2005 08:05:44 GMT

On 11 Nov 2005, at 20:32, bib_lucene bib wrote:
> -- Text I want to highlight is stored in the file system and index
> -- I can search and highlight the searched terms in results page  
> ( just snippets)
> -- I have given a download link next to snippets ( which will   
> point to file I stored in ROOT webapp of tomcat)
>
> I understood the concept of NullFragmenter, sorry for repeating  
> myself...
> It is something like a google search. In google if I enter search  
> term "highlight" and click on search, I get back search results  
> with word "highlight" in bold. ( I can do that)

If you're using NullFragmenter (which I just committed to contrib/ 
highlighter) then you're not getting snippets, you're getting the  
full text.  But this is unrelated to your main question.

> Now when I click on the links (Ex: GNU Source-highlight 2.2) which  
> is http://www.gnu.org/software/src-highlite/source-highlight.html I  
> want the term 'highlight" in bold when the page is displayed (This  
> I do not know how to do)
>
> I will index docs like html, pdf, word etc.
> As I have already extracted text using textminer etc, question is  
> when I click on "show full document" link in search results page  
> which I will give below the highlighted search snippet how can i  
> pull out full text of the document from index. As there is no  
> unique identifier for each doc in the index (?)

Well, as I said, from here on out is really in the domain of your  
application and not Lucene and the Highlighter.  Perhaps your index  
should have a unique identifying key field (Field.Keyword works well  
for this).  And then your links to the full text should be augmented  
to have that key in them, and that link needs to be to something  
dynamic, like a servlet, rather than a static link, that passes in  
the id of the document to be highlighted and the query to use with  
Highlighter.

> Or do I just have to extract the text from the file stored in file  
> system and pass it to the highlighter.

Perhaps.  I cannot decide this for you as it is, again, your  
application business logic that determines where the text of a  
document is.  You already mentioned the text is in the Lucene index  
in your case, so you could pass the document id (or the path, or some  
unique key) to the highlighting servlet, retrieve that document from  
Lucene, highlight the text using the NullFragmenter, and serve up  
that highlighted text.

> Here is the code snippet.
> LuceneHitHighlighter highlighter = new LuceneHitHighlighter 
> (queryStr, "snippet", "body");
>                 for (int i = 0; i < hits.size(); i++) {
>                     Document doc = (Document) hits.get(i);
>                     highlighter.doHighlight(doc);
>                     out.println("SNIPPET: " + doc.get("snippet"));
>                     out.println("<hr>");

You're adding a field to the document in doHighlight - this is not  
something I'd recommend.  It is not persistent, just so you know.  I  
recommend simply returning the highlighted String from that method  
rather than adding a field.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message