lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chas Emerick <cemer...@snowtide.com>
Subject Re: Google Desktop Could be Better
Date Thu, 21 Oct 2004 13:58:47 GMT
PDFTextStream's jar is < 400K.  This may not be relevant if Bill is 
only interested in open-source options, but I thought I'd put it out 
there anyway.

Chas Emerick

PDFTextStream: fast PDF text extraction for Java apps and Lucene
http://snowtide.com/home/PDFTextStream/

On Oct 17, 2004, at 12:51 AM, Ben Litchfield wrote:

>
> The latest PDFBox jar is 2179K, as you point out is significantly 
> larger
> than the jar in Parsnips.  The majority of that space is used by cmap
> mapping files used for proper text extraction so any classes that 
> could be
> removed would only result in a minor size reduction.  I would think 
> that
> the capability of indexing PDF documents would outweigh the extra time 
> for
> the download.
>
> Ben
>
>
>
>
> On Sat, 16 Oct 2004, Bill Tschumy wrote:
>
>>
>> On Oct 16, 2004, at 9:47 PM, Ben Litchfield wrote:
>>
>>>
>>>> types.  It uses Lucene underneath.  I'm thinking about extending it 
>>>> in
>>>> the direction that Google Desktop is going and automatically index
>>>> certain file types and directories in your system.
>>>
>>> And of course supporting PDF documents right!
>>>
>>> Ben
>>> http://www.pdfbox.org
>>>
>>
>> Ahem...  right...  My next version will do a better job with PDF and
>> RTF files.  I've looked at pdfBox, but the jar file is so big that I
>> hate to burden my users by incorporating it.  Any chance of getting a
>> smaller version that just does the text extraction?  Your jar file is
>> more than twice the size of my entire application including
>> documentation.  I really would like to solve this problem.
>> --
>> Bill Tschumy
>> Otherwise -- Austin, TX
>> http://www.otherwise.com
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message