manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramanan Sathiyanarayanan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-840) Job - Solr Mapping Improvement
Date Fri, 24 Jul 2015 12:19:05 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640382#comment-14640382
] 

Ramanan Sathiyanarayanan commented on CONNECTORS-840:
-----------------------------------------------------

Hi - We don't have metadata and content in one place. So, we have to write our own connector
to consolidate the data from two different sources. This works fine so far. But, we need some
more data from third source (eg. usage data) and we want to use this new data-point for our
scoring logic in Solr. This data is generated daily in a database and we need to use JDBCConnector
to update few fields in Solr. Since we need to update only once a day, we don't want to make
it look like a RepositoryDocument changed and create un-necessary load for our original connector
and its backends. 

1. For both these jobs, the ID of the document will be same.
2. Can MCF support two different jobs that will be having same ID.
3. Since the ID will be same, can I make the Solr output-connector for partial update. We
may avoid tika end point, since we are updating few fields directly.

> Job - Solr Mapping Improvement
> ------------------------------
>
>                 Key: CONNECTORS-840
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-840
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.4.1
>            Reporter: Alessandro Benedetti
>            Assignee: Karl Wright
>            Priority: Minor
>              Labels: field, mapping, request, solr, update
>             Fix For: ManifoldCF 1.5
>
>         Attachments: CONNECTORS-840.patch
>
>
> "When you configure a job to use a Solr-type output connection, the Solr connection type
provides a tab called "Field Mapping". The purpose of this tab is to allow you to map metadata
fields as fetched by the job's connection type to fields that Solr is set up to receive. This
is necessary because the names of the metadata items are often determined by the repository,
with no alignment to fields defined in the Solr schema. You may also suppress specific metadata
items from being sent to the index using this tab. 
> Add a new mapping by filling in the "source" with the name of the metadata item from
the repository, and "target" as the name of the output field in Solr, and click the "Add"
button. Leaving the "target" field blank will result in all metadata items of that name not
being sent to Solr."
> In my opinion we should change the way a metadata field is suppressed.
> The most natural way is that we express only the mappings of the metadata fields we want
to keep.
> All the missing params will not be sent to Solr.
> The improvement will be :
> - same interface with a boolean flag in addition, this flag will specify if the missing
metadata fields not expressed should be sent to Solr with the original names or not sent at
all.
> In this way if we want to keep 3/100 metadata fields, we don't have to write 100 mapping
entries , 97 empty but simply 3 entries and activate the flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message