manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Metadata adjuster
Date Wed, 22 Feb 2017 17:58:48 GMT
Ok, I added a bit of extra info to the null output connector document
ingestion simple history logging on trunk.  This extra info summarizes
attributes and their counts.  I then created a job, with an attribute and
fixed value, and left all other defaults in place.  The output looks like
this:

>>>>>>
02-22-2017 17:49:18.851 document ingest (null)
file:/C:/wip/mcf/trunk/README.txt
OK 4970 1 "myAttribute":1,"uri":1
02-22-2017 17:49:18.850 document ingest (null)
file:/C:/wip/mcf/trunk/Livelink.patch
OK 2201 1 "myAttribute":1,"uri":1
02-22-2017 17:49:18.840 document ingest (null)
file:/C:/wip/mcf/trunk/lib/c3p0-0.9.1.1.jar
OK 608376 1 "myAttribute":1,"uri":1
02-22-2017 17:49:18.830 document ingest (null)
file:/C:/wip/mcf/trunk/build.xml.svnpatch.rej
OK 515 1 "myAttribute":1,"uri":1

<<<<<<

It clearly picks up the attribute I injected, and indeed another one that
comes from the file system repository connector too.  Then I unchecked both
checkboxes in the metadata adjuster stage, and ran some more documents.
This is the result:

>>>>>>
02-22-2017 17:54:31.020 document ingest (null)
file:/C:/wip/mcf/trunk/dist/connector-common-lib/jna-4.1.0.jar
OK 914597 1 "myAttribute":1
02-22-2017 17:54:30.985 document ingest (null)
file:/C:/wip/mcf/trunk/lib/google-http-client-jackson2-1.19.0...
.jar
OK 6720 1 "myAttribute":1
<<<<<<

As you can see, it continued to inject the attribute, but now it no longer
passes through the upstream attribute.  This is working as designed.

So it seems clear that the issue must be related to either the Solr output
connector, or to the Solr configuration.  If you are using MCF 2.4, that
does *not* have the SolrJ 6.x version you will need to work with Solr 6.x.
That may well be where the trouble lies.  Please upgrade to MCF 2.6 to rule
out that possibility.  If that does not fix the issue, then I will bring
one of our resident Solr experts into the conversation.

Thanks,
Karl

On Wed, Feb 22, 2017 at 11:51 AM, Karl Wright <daddywri@gmail.com> wrote:

> Ah, sorry once again.  It is definitely the update/extract handler in the
> log entry you sent.
>
> I am quite busy at the moment and will review this evening further.
>
> Thanks,
> Karl
>
>
> On Wed, Feb 22, 2017 at 11:21 AM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Marisol,
>>
>> The [INFO] log statement you sent earlier was not an /update/extract
>> request, and your Solr connection is set up to send to the Solr Cell
>> /update/extract endpoint.  Can you look again in your logs and find the
>> *right* [INFO] statement?  Thanks!!
>>
>> Karl
>>
>>
>> On Wed, Feb 22, 2017 at 10:52 AM, Marisol Redondo <
>> marisol.redondo.garcia@gmail.com> wrote:
>>
>>> I have formatted it so you have all the information
>>>
>>> Name:  Sites solr dev       Description:  sites core in solr dev
>>> ________________________________________
>>> Connection type: Solr Max connections: 10
>>> ________________________________________
>>> Parameters:
>>> User ID=
>>> ZooKeeper znode path=
>>> Socket timeout=900
>>> Server remove handler=/update
>>> Included mime types=
>>> Use extract update handler=true
>>> Solr created date field name=
>>> ZooKeeper client timeout=60
>>> Solr modified date field name=
>>> Solr core name=sites
>>> Server protocol=http
>>> Realm=
>>> Server name=solrdev
>>> Server status handler=/admin/ping
>>> Password=********
>>> Excluded mime types=
>>> Commits=true
>>> Maximum document length=
>>> Server port=8983
>>> Connection timeout=60
>>> Solr type=standard
>>> Solr filename field name=
>>> Commit within=
>>> Solr id field name=id
>>> Solr mime type field name=
>>> ZooKeeper connect timeout=60
>>> Collection=collection1
>>> Server update handler=/update/extract
>>> Server web application=solr
>>> Solr original size field name=
>>> Solr indexed date field name=
>>> Solr content field name=
>>> ZooKeeper hosts: Host Port:
>>> localhost 2181
>>>
>>> Arguments: Name Value
>>> No arguments
>>>
>>> ________________________________________
>>> Connection status: Connection working
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message