Hello,
I am now using solr 1.3 with tomcat6 on a debian lenny box.
Could you please advise of any other instructions/HowTos on integrating Tika or
maybe RichDocumentHandler with Solr, that I can find online?
Apart from the Solr Wiki, as following those examples did not help in my
case.
Thank you.
Veselin K.
On Wed, Dec 17, 2008 at 10:43:57AM +0000, Veselin K wrote:
> Thank you Erik, Hoss.
>
> - If using either Solr's "stream.file" or Nutch's crawler,
> what is the procedure of adding new files?
> That is to say, if I did not know which are the new files in a
> specific folder and thus I passed all files to Solr/Nutch, would it
> skip the ones that have already been indexed?
>
> - Also what if I file gets modified, would Solr/Nutch detect
> the change and re-index just this modified the file?
> Or should some kind of cache be cleared and everything re-indexed?
>
> - In order to provide the user with an option to search the indexes of
> two separete Solr/Nutch servers, do I need to link both servers
> somehow and join their indexes into one, or is it just a question of
> designing the web front-end so that it offers the choice to send your
> search query to one or multiple different servers.
>
>
> Thank you,
> Veselin K
>
>
> On Sun, Dec 14, 2008 at 11:22:00AM -0800, Chris Hostetter wrote:
> >
> > : the easiest way to get rolling. A simple script that recurses your folders
> > : and issues a simple request posting each file in turn to Solr will give you a
> > : full text searchable index in no time (well, ok, it'll take a little time, but
> > : it'll be as fast as anything else out there).
> >
> > if all the files are "local" on the machine that Solr is running on you
> > don't even need to POST them, Solr can be configured to read the files by
> > local filename using the "stream.file" param...
> >
> > http://wiki.apache.org/solr/ContentStream
> >
> > that said: if your fileserver implementation already exposes all of the
> > files over HTTP, then using Nutch and it's crawler might be an easier way
> > to get started on indexing all of them ... hard to say without being in
> > your shoes. you may want to experiement with both.
> >
> >
> >
> > -Hoss
> >
|