manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Benedetti (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-981) Solr Connector - classic Solrj SolrInputDocument support
Date Tue, 24 Jun 2014 14:18:24 GMT


Alessandro Benedetti commented on CONNECTORS-981:

Following you, but let's analyze what Tika does in the Extract Update Handler :
It's extracts the stream and put it in a Solr field ("content") which is a string.

So, using the Solr Connector in "No Extract" mode you are saying you have already the content
extracted , so I don't get the problem in having it in a string.
I guess that will be normal sing the tika Extractor to have the String Copy of the binary
stream in thr Repo Document, to be then processed by the OutputConnector.
This is how Solr works and I suppose that should be the correct behaviour when you select
to not using the extract request handler.
But if you think it's better I can add a line in the Solr Connector transforming the Binary
Stream to String and then created the field in the SolrinputDocument.

> Solr Connector - classic Solrj SolrInputDocument support
> --------------------------------------------------------
>                 Key: CONNECTORS-981
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Alessandro Benedetti
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>         Attachments: CONNECTORS-981.patch
> The solr connector, according with the development of the Tika Connector processor, should
be able to operate in 2 ways :
> 1) as usual
> 2) using the classic Solrj SolrInputDocument approach with already extracted metadata
> To allow the choice a flag will be added in the UI in the mapping tab ( as it's related
with how the fields will be processed)

This message was sent by Atlassian JIRA

View raw message