lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Ackley" <>
Subject Re: Exotic format indexing?
Date Thu, 30 Oct 2003 21:19:37 GMT
> Finally, a while back, somebody on this list mentioned quiet a
> different approach: simply read the raw binary document and go fishing
> for what looks like text. I would like to try that :)

I have tried that approach and it works ok. You end up with a bunch of junk
in with the useful stuff. It can clutter up your index and make searching
slower. There are a lot of file formats that don't store all of the text as
sequential text so it won't work. PDF is one, I know that PowerPoint is

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message