manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arcadius Ahouansou <arcad...@menelic.com>
Subject Content filltering/exclusion with MCF
Date Tue, 28 Apr 2015 11:01:57 GMT
Hello.

I am using MCF 2.0.2 for crawling the web and ingesting data into Solr.

MCF has ingested into Solr documents that returned HTTP error let's says
401, 403, 404 or have a certain content like "this page has expired and has
been removed"

The question is:
is there a way to tell MCF to ingest
- only document not containing a certain content like "Not Found" or
- only document excluding those with header 401, 403, 404, 500, ...

Thank you very much.

Arcadius.

Mime
View raw message