lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelvin Tan" <kel...@relevanz.com>
Subject Re: indexing and searching different file formats
Date Fri, 15 Feb 2002 01:09:54 GMT
Uhmmm, I can contribute something which does a pretty decent job if anyone's
interested...

Just have to clean it up a little...

Regards,
Kelvin
----- Original Message -----
From: "W. Eliot Kimber" <eliot@isogen.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Friday, February 15, 2002 1:10 AM
Subject: Re: indexing and searching different file formats


> Andrew Libby wrote:
>
> > and the text needs to be retrieved for indexing.  An extreeme example is
> > a PDF which has a considerably complicated document format.
>
> The PJ library from www.etymon.com provides a pretty complete and
> easy-to-use API for getting info from PDF docs. It wouldn't be too hard
> to write a PDF indexer for Lucene using this library. The main challenge
> would be guessing word boundaries in strings where spaces have been
> replaced with explicit shift values by the formatter.
>
> Cheers,
>
> Eliot
> --
> W. Eliot Kimber, eliot@isogen.com
> Consultant, ISOGEN International
>
> 1016 La Posada Dr., Suite 240
> Austin, TX  78752 Phone: 512.656.4139
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message