lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Giles <>
Subject Re: Title of PDF
Date Mon, 28 Jun 2004 15:21:29 GMT

I think you misunderstood Otis.  You will need to use some sort of parser 
(i.e. pdfbox, xpdf) to get the title text from the PDF (I assume you are 
indexing the documents, so you have this already).  Then you create a 
"title" field in your index and store the text of the title in there (so 
that you can return it with the results).  That is independent of what you 
do with the rest of the PDF document.


At 11:17 AM 6/28/2004, Don Vaillancourt wrote:
>It can't be that simple because the field will contain the whole PDF and 
>not just the title.  And for PDFs that are 3 or 4 megs, it is  really not 
>reasonable to store the who PDF in the collection just to get the title.

Save and share anything you find online - Furl @  

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message