lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Janssen <jans...@parc.com>
Subject Re: Google Desktop Could be Better
Date Sun, 17 Oct 2004 11:09:18 GMT
Bill Tschumy writes:
> I've looked at pdfBox, but the jar file is so big that I 
> hate to burden my users by incorporating it.

Bill,

My system (see http://www.parc.com/janssen/pubs/TR-03-16.pdf) uses
pdftotext underneath.  I've been very satisfied with that.  Another
Java solution would be to use Multivalent
(multivalent.sourceforge.net).  Multivalent, by the way, advertises
the following:

"Extract text from all formats. Full-text search with Lucene."

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message