On Wednesday, November 5, 2003, at 03:51 AM, Marcel Stor wrote:
> Hi all,
>
> I'm thinkin' about writing a search tool for my filesystem. I know such
> things exist already but programming it myself is much more fun ;-)
> So, I would have Lucene crawl through my filesystem and pass each file
> to an appropriate indexer (PDF -> PDFbox, etc.). Yes, I run a Windows
> system and would depend on the file ending to distinguish the file
> type.
> Is this a good idea in general? Is there a list of available indexer
> for
> the the different file types? Any other comments are also welcome.
The general idea (limited to .txt files intentionally) is included in
this code:
http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html
The Ant <index> task in jakarta-lucene-sandbox CVS repository has a
document handler interface that is designed to allow for plugability.
You named the PDF pieces, and there is POI for dealing with Office
documents.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|