manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Ingest all documents into Solr everytime the filesystem is crawled
Date Thu, 27 Dec 2012 14:28:49 GMT
If you go to the Solr output connection's "view" screen, there's a
button which says something like "Reingest all documents" or some
such.  Click that and ManifoldCF will 'forget' about what it indexed
before into Solr.

Karl

On Thu, Dec 27, 2012 at 9:16 AM, Tasat Bar <tasatbar@gmail.com> wrote:
> Hi there,
>
> Over the past few days I've been playing around with Solr and ManifoldCF...
> and I have to say, I'm quite impressed, that everything works that well :)
>
> However, I have a short question: When using the filesystem connector, is
> there a way to let ManifoldCF always send all documents to Solr, no matter
> whether they have been crawled/send before?
> I set up a job to crawl the local file system and send documents to Solr.
> When starting the job for the first time, everything works perfectly well
> and the document is successfully ingested into Solr.
> The problem is, that when I delete Solr's index (I'm still playing around
> with Solr, so that happens from time to time) and restart the ManifoldCF
> job, the document is not sent to Solr (again) - probably because it assumes,
> that this is not necessary, since the document did not change. What I then
> did was to clear the crawled directory, start the crawl job (ManifoldCF
> realises that the directory is empty), re-populate the directory and restart
> the crawl job.
> I definitely don't want to set the crawl jobs up like this later on, but for
> testing that would be quite handy.
>
> I hope I accidentally didn't overlook something in the user documentation or
> in the mailing list... Any help/hint is appreciated.
>
> Cheers,
> Tasat
>
>
>

Mime
View raw message