lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Indexing local PDFs: Lucene/Solr/Nutch ?
Date Sun, 14 Dec 2008 14:55:48 GMT
The trunk of Solr with the new ExtractingRequestHandler (Tika) will  
surely be the easiest way to get rolling.  A simple script that  
recurses your folders and issues a simple request posting each file in  
turn to Solr will give you a full text searchable index in no time  
(well, ok, it'll take a little time, but it'll be as fast as anything  
else out there).

	Erik

On Dec 14, 2008, at 9:15 AM, Veselin Kantsev wrote:

> Hello,
> first of all, thanks for these great projects.
> I discovered Lucene and its subs, a day ago and all these seem  
> amazing.
>
> My goal:
> --------
> A file server with numerous folders containing documents  
> (pdf,doc,txt etc.)
> that need to be indexed and searchable via a web interface or similar.
> The number of files might be from 500 000 to 1 000 000 or so.
> Ideally the solution would be capable of handling a lot more than  
> that,
> in case of future growth.
>
> My question:
> ------------
> Which of the projects (Lucene, Solr, Nutch) will be most suitable in  
> my case?
>
> Thank you much.
>
> --
> Veselin K


Mime
View raw message