lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject Re: Highlighting PDF file after the search
Date Mon, 20 Sep 2004 22:02:55 GMT
Balasubramanian.Vijay@epamail.epa.gov wrote:

> 
> 
> 
> Hello,
> 
> I can successfully index and search the PDF documents, however i am not
> able to highlight the searched text in my original PDF file (ie: like
> dtSearch
> highlights on original file)
> 
> I took a look at the highlighter in sandbox, compiled it and have it
> ready.  I am wondering if this highlighter is for highlighting indexed
> documents or
> can it be used for PDF Files as is !  Please enlighten !

I did this a few weeks ago.

There are two ways, and they both revolve round the same thing, you need 
the tokenized PDF text available.

[a] Store the tokenized PDF text in the index, or in some other file on 
disk i.e. a "cache" ( but cache is a misleading term, as you can't have 
a cache miss unless you can do [b]).

[b] Tokenize it on the fly when you call getBestFragments() - the 1st 
arg, the TokenStream, should be one that takes a PDF file as input and 
tokenizes it.

http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/highlighter/build/docs/api/org/apache/lucene/search/highlight/Highlighter.html#getBestFragments(org.apache.lucene.analysis.TokenStream,%20java.lang.String,%20int,%20java.lang.String)
> 
> Thanks,
> 
> Vijay Balasubramanian
> DPRA Inc.,
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message