manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Metadata adjuster
Date Wed, 22 Feb 2017 14:53:22 GMT
Hi Marisol,

The [INFO] log entries indicate that your document has almost no metadata
at all.  But the Metadata Adjuster transformation connector is designed to
do exactly what you want.

Can you view your job, and cut and paste the View Job page into an email,
so I can see how your metadata adjuster transformation connection and your
solr output connections are configured?  Thanks!

Karl




On Wed, Feb 22, 2017 at 8:57 AM, Marisol Redondo <
marisol.redondo.garcia@gmail.com> wrote:

> Hi  Karl and thank you for this quick answer.
>
> I was reading the documentation of MCF 1.10 but I'm using MCF 2.5, sorry
> for the confusion, and I think this version is compatible with solr6.
> The pdf doesn't have any metadata or field called facetContentType, this
> is because I'd been trying to use the Metadata Adjuster, to add a new
> metadata/property to the doc so solr can index by this field when I'm
> injecting the doc.
> Should I use other transformation or is there any other way of duing it?
> I am migrating from nutch to ManifoldCF and in nutch we can do it with
> plugins, and I was thinking that the plugins in nutch are the same as the
> transformation connectors in MCF
>
> The completed error in solr is :
>
> 017-02-21 13:19:32.108 INFO  (qtp1854778591-18) [   x:sites]
>> o.a.s.c.PluginBag Going to create a new requestHandler with {type =
>> requestHandler,name = /update/extract,class = solr.extraction.ExtractingRequestHandler,args
>> = {defaults={lowernames=true,fmap.meta=ignored_,fmap.
>> content=_text_,update.chain=add-unknown-fields-to-the-schema,df=_text_}}}
>
> 2017-02-21 13:19:32.454 INFO  (qtp1854778591-18) [   x:sites] o.a.s.u.p.LogUpdateProcessorFactory
>> [sites]  webapp=/solr path=/up
>
> date/extract params={resource.name=introduction.pdf&literal.id=https://
>> ...../introduction.pdf&wt=xml&version=2.2}{} 0 347
>
> 2017-02-21 13:19:32.455 ERROR (qtp1854778591-18) [   x:sites]
>> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: [
>
> doc=https://....../introduction.pdf] missing required field:
>> facetContentType
>
>         at org.apache.solr.update.DocumentBuilder.toDocument(
>> DocumentBuilder.java:197)
>
>         at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(
>> AddUpdateCommand.java:82)
>
>         at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(
>> DirectUpdateHandler2.java:277)
>
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc0(
>> DirectUpdateHandler2.java:211)
>
>
>
> Thanks
>
>
> On 21 February 2017 at 14:52, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Marisol,
>>
>> Can you find the [INFO] entry in the Solr log for this document?  That
>> should help clear up any confusion.
>>
>> Also, for what it is worth, MCF 1.10 is not using a SolrJ that is up to
>> date with Solr 6.x.  That could be the source of the problem  Is there any
>> reason you are using a 1.x version of MCF?
>>
>> Karl
>>
>>
>> On Tue, Feb 21, 2017 at 8:42 AM, Marisol Redondo <
>> marisol.redondo.garcia@gmail.com> wrote:
>>
>>> Hi.
>>>
>>> I'm trying to use metadata adjuster to add one field to the solr index
>>> but doesn't inject the field into a solr's field.
>>> Maybe I'm misundertaning the use of the metada adjuster, but I have read
>>> in the documentation (https://manifoldcf.apache.org
>>> /release/release-1.10/en_US/end-user-documentation.html) that I can add
>>> metadata to the document that is going to be indexed into solr, but the
>>> solr instance gave me the error "missing required field:
>>> facetContentType".
>>>
>>> ManifoldCF Job pipeline:
>>> 1. Repository (type web repository)
>>> 2. Transformation (Tikka Metadata Extractor)
>>> 3. Transformation (type Metada Adjuster)
>>> 4. Output (Solr 6)
>>>
>>> ManifoldCF Job Metadata Expressions tab:
>>>   Parameter name: "facetContentType"
>>>   Remove this parameter: false
>>>   Expresion: xxxx  (the literal text value I want in facetContentType)
>>>
>>> Solr schema:
>>>   .....
>>>   <field name="facetContentType" type="string" indexed="true"
>>> stored="true" required="true"/>
>>>  ....
>>>
>>> The error logged in ManifoldCF is:
>>>       Error from server at http://solrServer:port/solr/c
>>> <http://revnetsolrdev:8983/solr/sites>ore: [doc=https://
>>> ....../index.aspx] missing required field: facetContentType.
>>>
>>> Thanks for your help
>>>
>>
>>
>

Mime
View raw message