lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Indexing local PDFs: Lucene/Solr/Nutch ?
Date Sun, 14 Dec 2008 19:22:00 GMT

: the easiest way to get rolling.  A simple script that recurses your folders
: and issues a simple request posting each file in turn to Solr will give you a
: full text searchable index in no time (well, ok, it'll take a little time, but
: it'll be as fast as anything else out there).

if all the files are "local" on the machine that Solr is running on you 
don't even need to POST them, Solr can be configured to read the files by 
local filename using the "stream.file" param...

	http://wiki.apache.org/solr/ContentStream

that said: if your fileserver implementation already exposes all of the 
files over HTTP, then using Nutch and it's crawler might be an easier way 
to get started on indexing all of them ... hard to say without being in 
your shoes.  you may want to experiement with both.



-Hoss


Mime
View raw message