lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Vaillancourt <d...@webimpact.com>
Subject PDF Indexing Issue
Date Mon, 28 Jun 2004 20:58:50 GMT
I used the following code example from an article that I linked off of 
jakarta's site to index PDF files:

doc.add(Field.Text("content", new FileReader(f)));

But I realized today that this method only indexes the PDF as is.  For 
those wondering if the the PDF were actually indexed or if maybe they only 
contained images, well I verified this with LUKE and those PDFs are in 
there, but the only keywords that were indexed were the PDF defintion 
statements and encoded stuff.

So what is the proper way to index a PDF?

Thank You


Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
416-815-2000 ext. 245
email: donv@web-impact.com
web: http://www.web-impact.com




This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.












Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message