lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Index entire filesystem
Date Wed, 05 Nov 2003 09:57:19 GMT
On Wednesday, November 5, 2003, at 03:51  AM, Marcel Stor wrote:
> Hi all,
> I'm thinkin' about writing a search tool for my filesystem. I know such
> things exist already but programming it myself is much more fun ;-)
> So, I would have Lucene crawl through my filesystem and pass each file
> to an appropriate indexer (PDF -> PDFbox, etc.). Yes, I run a Windows
> system and would depend on the file ending to distinguish the file 
> type.
> Is this a good idea in general? Is there a list of available indexer 
> for
> the the different file types? Any other comments are also welcome.

The general idea (limited to .txt files intentionally) is included in 
this code:

The Ant <index> task in jakarta-lucene-sandbox CVS repository has a 
document handler interface that is designed to allow for plugability.  
You named the PDF pieces, and there is POI for dealing with Office 


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message