manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-981) Solr Connector - classic Solrj SolrInputDocument support
Date Tue, 24 Jun 2014 13:30:25 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042098#comment-14042098
] 

Karl Wright edited comment on CONNECTORS-981 at 6/24/14 1:28 PM:
-----------------------------------------------------------------

Hi Alessandro,

I'm afraid I disagree; rather than the primary content being some metadata with an arbitrary
name, it should remain as primary content.  After the Tika Extractor, the stream *has* been
converted to text/plain charset utf-8.  But:
- It's a stream, not a string, because it may be quite large
- Even if it were metadata, it would be a Reader, not a string, and converting it to as string
before indexing would be a bad idea.

Surely SolrInputDocument has provision for handing a character stream?  If not, it's not a
good abstraction for us to be using.





was (Author: kwright@metacarta.com):
Hi Alessandro,

I'm afraid I disagree; rather than the primary content being some metadata with an arbitrary
name, it should remain as primary content.  After the Tika Extractor, the stream *has* been
converted to text/plain charset utf-8.  But:
- It's a stream, not a string, because it may be quite large
- Even if it were metadata, it would be a Reader, not a string, and converting it to as string
before indexing would be a bad idea
Surely SolrInputDocument has provision for handing a character stream?  If not, it's not a
good abstraction for us to be using.




> Solr Connector - classic Solrj SolrInputDocument support
> --------------------------------------------------------
>
>                 Key: CONNECTORS-981
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-981
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Alessandro Benedetti
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: CONNECTORS-981.patch
>
>
> The solr connector, according with the development of the Tika Connector processor, should
be able to operate in 2 ways :
> 1) as usual
> 2) using the classic Solrj SolrInputDocument approach with already extracted metadata
> To allow the choice a flag will be added in the UI in the mapping tab ( as it's related
with how the fields will be processed)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message