lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Giles <mgi...@visionstudio.com>
Subject Re: Title of PDF
Date Mon, 28 Jun 2004 15:21:29 GMT
Don,

I think you misunderstood Otis.  You will need to use some sort of parser 
(i.e. pdfbox, xpdf) to get the title text from the PDF (I assume you are 
indexing the documents, so you have this already).  Then you create a 
"title" field in your index and store the text of the title in there (so 
that you can return it with the results).  That is independent of what you 
do with the rest of the PDF document.

-Mike

At 11:17 AM 6/28/2004, Don Vaillancourt wrote:
>It can't be that simple because the field will contain the whole PDF and 
>not just the title.  And for PDFs that are 3 or 4 megs, it is  really not 
>reasonable to store the who PDF in the collection just to get the title.

________________________________________________________________________
Save and share anything you find online - Furl @ http://www.furl.net  


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message