lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From petite_abeille <petite_abei...@mac.com>
Subject Re: Exotic format indexing?
Date Thu, 30 Oct 2003 20:12:01 GMT

On Oct 30, 2003, at 20:48, Ben Litchfield wrote:

> Unfortunately, it is not quite so easy.  I am not sure about Word
> documents

The raw text is visible.

> but PDFs usually have there contents compressed

Yep. PDF is really an image format ;)

> so a raw
> "fishing" around for text would be pointless.

That's alright. I can handle PDF separately if the need arise.

>  Your best bet is to use a
> package like the one from textmining.org that handles various formats 
> for
> you.

Perhaps. But I'm only looking for a "good enough" solution, not a 
perfect one :)

Cheers,

PA.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message