manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédéric Olier <FOl...@wooxo.fr>
Subject RE: [Solr] Error on documents makes ManifoldCF
Date Wed, 21 Oct 2015 15:50:33 GMT
Hi Karl,

Many thanks.

I found the configuration to use:
Here
http://www.francelabs.com/blog/tutorial-for-combining-manifoldcf-and-solr-for-files-search/

Search for "ignoreTikaException"

I'll test it and see if it fixes my issue.

Fred​


-----Message d'origine-----
De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mercredi 21 octobre 2015 17:23
À : dev
Objet : Re: [Solr] Error on documents makes ManifoldCF

Standard google searching finds it.

See:

http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201503.mbox/%3C55127866020000250008FD2A@slesmail.veritablelp.com%3E

Karl


On Wed, Oct 21, 2015 at 11:14 AM, Frédéric Olier <FOlier@wooxo.fr> wrote:

> Hi,
>
> Thanks for your reply.
>
> I looked here : 
> http://mail-archives.apache.org/mod_mbox/manifoldcf-dev/
>
> But there is no 'search' option...
>
> Any idea where I can search what I'm looking for more efficiently ?
>
> Thanks​
>
>
> -----Message d'origine-----
> De : Karl Wright [mailto:daddywri@gmail.com] Envoyé : mercredi 21 
> octobre 2015 16:47 À : dev Objet : Re: [Solr] Error on documents makes 
> ManifoldCF
>
> Hi Frédéric,
>
> There's a flag in the Solr configuration you can set that will cause 
> exceptions from Solr Cell (Tika) to cause the document to be skipped 
> rather than causing ManifoldCF to retry the document.  I don't 
> remember what it is but others have noted it and you can search the mail archive to find
it.
>
> Thanks,
> Karl
>
>
> On Wed, Oct 21, 2015 at 10:29 AM, Frédéric Olier <FOlier@wooxo.fr> wrote:
>
> > Hi,
> >
> >
> >
> > We integrated Solr to ManifoldCF.
> >
> > We configured Solr to use the OCR engine.
> >
> >
> >
> > When we crawl documents MCF reads the docs fine and submit them to Solr.
> >
> >
> >
> > It happens on large files (PDF, images) that the OCR takes too long 
> > which leads to MCF request to fail.
> >
> >
> >
> > The annoying thing is that MCF does not ignore the file.
> >
> > On the next crawling, the file keeps failing.
> >
> >
> >
> > How could I tell manifold to skip the file that fails ?
> >
> >
> >
> > Thanks for your reply.
> >
> >
> >
> > [image: TOP 250 des éditeurs]
> > <http://miblink.letsignit.com/r/3808/0a67e322-f9f6-4d7b-89bb-46f2830
> > 87
> > b34/undefined>
> >
> > [image: Logo]
> > <http://miblink.letsignit.com/r/1794/1a6d2119-9a4e-4a6d-ba13-8730eac
> > 1b
> > 836/undefined>
> >
> > *Suivez-nous !*
> >
> > [image: Linkedin]
> > <http://miblink.letsignit.com/r/1795/28939672-253e-4233-8ba0-9b8738a
> > fa
> > 52f/undefined>
> >
> > [image: Viadeo]
> > <http://miblink.letsignit.com/r/1796/41a2cad7-8cc0-4a99-91f0-dec6f46
> > 3f
> > e83/undefined>
> >
> > [image: Twitter]
> > <http://miblink.letsignit.com/r/1797/7a7a83af-ce3e-4d9e-83fa-aeb9d3b
> > 26
> > d01/undefined>
> >
> > [image: Googleplus]
> > <http://miblink.letsignit.com/r/2870/20ae85fe-1e5f-4e23-b3f8-365a199
> > 76
> > f79/undefined>
> >
> > *Frédéric OLIER** | Responsable de la planification stratégique*
> >
> > * 33 442 016 891 33 662 635 031*
> >
> > *WOOXO*
> > Tél : 0811 140 160
> > Fax0811 481 507
> > Immeuble Le Forum - Bât A - 3ème étage
> > 515 av. de la Tramontane
> > ZAC Athélia IV
> > 13600 LA CIOTAT
> > FRANCE
> >
> >
> >
> >
> >
>
Mime
View raw message