Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates
 216.139.236.158 as permitted sender)
Message-ID: <20971377.post@talk.nabble.com>
Date: Fri, 12 Dec 2008 00:34:28 -0800 (PST)
From: maxmil <mail@alwayssunny.com>
To: java-user@lucene.apache.org
Subject: Beginner: Best way to index and display orginal text of pdfs in
 search results
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi,

This is the first time i am using Lucene.

I need to index pdf's with very few fields, title, date and body (long
field) for a web based search.

The results i need to display have to show not only the documents found but
for each document a snapshot of the text where the search term has been
found. This is analogous to the way google displays search results. That is
to say

 ... some words and first instance of search Term and some more words ...
some more words second instance of search term and some more words... 

etc.

To do this i would need the original text of the document for each hit. As
far as i understand Lucene does not save the original text of the document
in the index.

I am not using a database and would prefer not to have to store the original
document text elsewhere.

One way i could do this would be to take the hits from Lucene and reopen
each pdf to extract the original text at run time however i fear that with
many results this would be very slow. 

What would you recommend me to do?

Thanks

max
-- 
View this message in context: http://www.nabble.com/Beginner%3A-Best-way-to-index-and-display-orginal-text-of-pdfs-in-search-results-tp20971377p20971377.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org