lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: Google Desktop Could be Better
Date Sun, 17 Oct 2004 04:51:57 GMT

The latest PDFBox jar is 2179K, as you point out is significantly larger
than the jar in Parsnips.  The majority of that space is used by cmap
mapping files used for proper text extraction so any classes that could be
removed would only result in a minor size reduction.  I would think that
the capability of indexing PDF documents would outweigh the extra time for
the download.

Ben




On Sat, 16 Oct 2004, Bill Tschumy wrote:

>
> On Oct 16, 2004, at 9:47 PM, Ben Litchfield wrote:
>
> >
> >> types.  It uses Lucene underneath.  I'm thinking about extending it in
> >> the direction that Google Desktop is going and automatically index
> >> certain file types and directories in your system.
> >
> > And of course supporting PDF documents right!
> >
> > Ben
> > http://www.pdfbox.org
> >
>
> Ahem...  right...  My next version will do a better job with PDF and
> RTF files.  I've looked at pdfBox, but the jar file is so big that I
> hate to burden my users by incorporating it.  Any chance of getting a
> smaller version that just does the text extraction?  Your jar file is
> more than twice the size of my entire application including
> documentation.  I really would like to solve this problem.
> --
> Bill Tschumy
> Otherwise -- Austin, TX
> http://www.otherwise.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message