lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelvin Tan" <kel...@relevanz.com>
Subject Re: indexing and searching different file formats
Date Fri, 15 Feb 2002 03:52:01 GMT
Known limitations here:
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00280.html

HTH.

Regards,
Kelvin

PS: Pj library is GPL'ed. Commercial licenses go for $5,000 per 100 copies
(1 CPU per copy).

----- Original Message -----
From: "Kelvin Tan" <kelvin@relevanz.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>; <eliot@isogen.com>
Sent: Friday, February 15, 2002 9:09 AM
Subject: Re: indexing and searching different file formats


> Uhmmm, I can contribute something which does a pretty decent job if
anyone's
> interested...
>
> Just have to clean it up a little...
>
> Regards,
> Kelvin
> ----- Original Message -----
> From: "W. Eliot Kimber" <eliot@isogen.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Friday, February 15, 2002 1:10 AM
> Subject: Re: indexing and searching different file formats
>
>
> > Andrew Libby wrote:
> >
> > > and the text needs to be retrieved for indexing.  An extreeme example
is
> > > a PDF which has a considerably complicated document format.
> >
> > The PJ library from www.etymon.com provides a pretty complete and
> > easy-to-use API for getting info from PDF docs. It wouldn't be too hard
> > to write a PDF indexer for Lucene using this library. The main challenge
> > would be guessing word boundaries in strings where spaces have been
> > replaced with explicit shift values by the formatter.
> >
> > Cheers,
> >
> > Eliot
> > --
> > W. Eliot Kimber, eliot@isogen.com
> > Consultant, ISOGEN International
> >
> > 1016 La Posada Dr., Suite 240
> > Austin, TX  78752 Phone: 512.656.4139
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>
>
>

Mime
View raw message