manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Schuch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1482) Mime type exclusion and document length exclusion in Solr output connector don't apparently work
Date Wed, 10 Jan 2018 17:09:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320655#comment-16320655
] 

Markus Schuch commented on CONNECTORS-1482:
-------------------------------------------

{quote}
First, you can only exclude mime types if you are using the extracting update handler
{quote}
Why is that so? As i understand, in the SolrJ case the binary content has to be extracted
by a {{DocTransformer}} or something else, but the upstream repository connectors still could
decide not to send the document to the pipeline at all, coudn't it?

> Mime type exclusion and document length exclusion in Solr output connector don't apparently
work
> ------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1482
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1482
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 2.9
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.10
>
>         Attachments: problem_documents_connector.png, problem_documents_connector_solr.png,
problem_documents_connector_solr_stream_size.png
>
>
> See attached images.  Setting exclusions apparently does not prevent documents with that
mime type from being included.  This may be because of regexp characters etc but it needs
to be researched and documented at least.  Also, the length limitation doesn't seem to be
working either.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message