lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From carl...@bookandhammer.com
Subject Re: Indexing PDF - a potential solution
Date Sat, 12 Jan 2002 18:12:27 GMT
  Hi Kelvin,

Thanks for the repost of your information. That was great to see.

If you have money to spend and really want high quality PDF conversions, 
you might want to talk with
pdfsages (pdfsages.com). I talked with their CTO and it seems they have 
some technology that does a really good job dealing with complex pdf 
files. Also, I'm sure they can handle the compression information (in 
the xpdf FAQ, they say they handle LZW compressed files on Unix by using 
some command line tool).


--Peter

On Friday, January 11, 2002, at 07:19 PM, Kelvin Tan wrote:

> Peter,
>
> I had something to say about indexing PDFs in a previous post.
>
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00280.html
>
> Regards,
> Kelvin
>
> ----- Original Message -----
> From: <carlson@bookandhammer.com>
> To: Lucene Developers List <lucene-dev@jakarta.apache.org>
> Sent: Saturday, January 12, 2002 5:24 AM
> Subject: Indexing PDF - a potential solution
>
>
>> Hi,
>> I know that some people have wanted to index PDF files.
>> I just heard about a pdf to text conversion utility called xpdf.
>>
>> http://www.foolabs.com/xpdf/
>>
>> I haven't tried this, and it's not java based, although it is open
>> source.
>>
>> I hope this helps
>>
>> --Peter
>>
>>
>> --
>> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>>
>>
>
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-
> help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message