The trunk of Solr with the new ExtractingRequestHandler (Tika) will surely be the easiest way to get rolling. A simple script that recurses your folders and issues a simple request posting each file in turn to Solr will give you a full text searchable index in no time (well, ok, it'll take a little time, but it'll be as fast as anything else out there). Erik On Dec 14, 2008, at 9:15 AM, Veselin Kantsev wrote: > Hello, > first of all, thanks for these great projects. > I discovered Lucene and its subs, a day ago and all these seem > amazing. > > My goal: > -------- > A file server with numerous folders containing documents > (pdf,doc,txt etc.) > that need to be indexed and searchable via a web interface or similar. > The number of files might be from 500 000 to 1 000 000 or so. > Ideally the solution would be capable of handling a lot more than > that, > in case of future growth. > > My question: > ------------ > Which of the projects (Lucene, Solr, Nutch) will be most suitable in > my case? > > Thank you much. > > -- > Veselin K