manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-981) Solr Connector - classic Solrj SolrInputDocument support
Date Tue, 24 Jun 2014 14:30:25 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042170#comment-14042170
] 

Karl Wright commented on CONNECTORS-981:
----------------------------------------

Hi Alessandro,

One of the principle ways we make ManifoldCF be robust is to make sure that memory usage is
"bounded".  That is, the crawler cannot use more than a set amount of memory no matter what
the inputs are.  See: https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/, chapter 6,
section 6.3.5.

Solr can, of course, make a different decision as a project.  We choose to enforce a limit.
 I suppose you could, say, limit the maximum number of bytes sent to Solr to, say, 64K.  But
I suspect people would not like that.


> Solr Connector - classic Solrj SolrInputDocument support
> --------------------------------------------------------
>
>                 Key: CONNECTORS-981
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-981
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Alessandro Benedetti
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: CONNECTORS-981.patch
>
>
> The solr connector, according with the development of the Tika Connector processor, should
be able to operate in 2 ways :
> 1) as usual
> 2) using the classic Solrj SolrInputDocument approach with already extracted metadata
> To allow the choice a flag will be added in the UI in the mapping tab ( as it's related
with how the fields will be processed)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message