manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: [Solr] Error on documents makes ManifoldCF
Date Wed, 21 Oct 2015 15:05:49 GMT
Hi Olier,

I think it is onError="skip" defined for entity processors?

https://issues.apache.org/jira/browse/SOLR-7076
Does Extracting request handler have similar config parameter?  

Ahmet



On Wednesday, October 21, 2015 5:46 PM, Karl Wright <daddywri@gmail.com> wrote:



Hi Frédéric,

There's a flag in the Solr configuration you can set that will cause
exceptions from Solr Cell (Tika) to cause the document to be skipped rather
than causing ManifoldCF to retry the document.  I don't remember what it is
but others have noted it and you can search the mail archive to find it.

Thanks,
Karl


On Wed, Oct 21, 2015 at 10:29 AM, Frédéric Olier <FOlier@wooxo.fr> wrote:

> Hi,
>
>
>
> We integrated Solr to ManifoldCF.
>
> We configured Solr to use the OCR engine.
>
>
>
> When we crawl documents MCF reads the docs fine and submit them to Solr.
>
>
>
> It happens on large files (PDF, images) that the OCR takes too long which
> leads to MCF request to fail.
>
>
>
> The annoying thing is that MCF does not ignore the file.
>
> On the next crawling, the file keeps failing.
>
>
>
> How could I tell manifold to skip the file that fails ?
>
>
>
> Thanks for your reply.
>
>
>
> [image: TOP 250 des éditeurs]
> <http://miblink.letsignit.com/r/3808/0a67e322-f9f6-4d7b-89bb-46f283087b34/undefined>
>
> [image: Logo]
> <http://miblink.letsignit.com/r/1794/1a6d2119-9a4e-4a6d-ba13-8730eac1b836/undefined>
>
> *Suivez-nous !*
>
> [image: Linkedin]
> <http://miblink.letsignit.com/r/1795/28939672-253e-4233-8ba0-9b8738afa52f/undefined>
>
> [image: Viadeo]
> <http://miblink.letsignit.com/r/1796/41a2cad7-8cc0-4a99-91f0-dec6f463fe83/undefined>
>
> [image: Twitter]
> <http://miblink.letsignit.com/r/1797/7a7a83af-ce3e-4d9e-83fa-aeb9d3b26d01/undefined>
>
> [image: Googleplus]
> <http://miblink.letsignit.com/r/2870/20ae85fe-1e5f-4e23-b3f8-365a19976f79/undefined>
>
> *Frédéric OLIER** | Responsable de la planification stratégique*
>
> * 33 442 016 891 33 662 635 031*
>
> *WOOXO*
> Tél : 0811 140 160
> Fax0811 481 507
> Immeuble Le Forum - Bât A - 3ème étage
> 515 av. de la Tramontane
> ZAC Athélia IV
> 13600 LA CIOTAT
> FRANCE
>
>
>
>
>

Mime
View raw message