lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Ackley" <sack...@cfl.rr.com>
Subject Re: Exotic format indexing?
Date Thu, 30 Oct 2003 21:19:37 GMT
> Finally, a while back, somebody on this list mentioned quiet a
> different approach: simply read the raw binary document and go fishing
> for what looks like text. I would like to try that :)

I have tried that approach and it works ok. You end up with a bunch of junk
in with the useful stuff. It can clutter up your index and make searching
slower. There are a lot of file formats that don't store all of the text as
sequential text so it won't work. PDF is one, I know that PowerPoint is
another.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message