manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Crawler output transformation before indexing into Solr
Date Tue, 24 Jul 2012 15:41:16 GMT
Solr Cell is what you want to use here.  It's a tika pipeline that you
can configure to modify the data as you need.

Karl

On Tue, Jul 24, 2012 at 11:35 AM, Arcadius Ahouansou
<arcadius@menelic.com> wrote:
>
> Hello.
>
> I am currently ManifoldCF 0.6 to crawl and index into Solr4.
>
> I need to extract data such as locations from the documents into a separate
> field before I index  into solr.
>
> - Is there a way this can be done with ManifoldCF?
> - If not, is there an output connector allowing to store the content into an
> database? Then I coud do the transformation on the DB before indexing.
>
> Thank you very much.
>
> Arcadius.
>
>

Mime
View raw message