manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-981) Solr Connector - classic Solrj SolrInputDocument support
Date Tue, 24 Jun 2014 23:41:24 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042728#comment-14042728
] 

Karl Wright edited comment on CONNECTORS-981 at 6/24/14 11:41 PM:
------------------------------------------------------------------

So, let us talk about solutions.  I think there are two possibilities:

(1) Use SolrInputDocument and also modify Solr Connector to have a user-settable length limit.
OR
(2) Continue to use ContentStreamUpdateRequest in HttpPoster, but modify the code to expect
the RepositoryDocument to contain a utf-8-encoded input stream.

It's worth noting that the SolrHttpServer.add(SolrInputDocument) method does the following:

{code}
  public UpdateResponse  [More ...] add(SolrInputDocument doc, int commitWithinMs) throws
SolrServerException, IOException {
    UpdateRequest req = new UpdateRequest();
    req.add(doc);
    req.setCommitWithin(commitWithinMs);
    return req.process(this);
  }
{code}

Since both ContentStreamUpdateRequest and UpdateRequest are extensions of AbstractUpdateRequest,
and AbstractUpdateRequest is where content stream support lives, there may be a way to do
this by adding a content stream to an UpdateRequest object directly.  I'll have to look deeper
at the UpdateRequest code to see if that has any chance of working.



was (Author: kwright@metacarta.com):
So, let us talk about solutions.  I think there are two possibilities:

(1) Use SolrInputDocument and also modify Solr Connector to have a user-settable length limit.
OR
(2) Continue to use ContentStreamUpdateRequest in HttpPoster, but modify the code to expect
the RepositoryDocument to contain a utf-8-encoded input stream.

It's worth noting that the SolrHttpServer.add(SolrInputDocument) method does the following:

{code}
  public UpdateResponse  [More ...] add(SolrInputDocument doc, int commitWithinMs) throws
SolrServerException, IOException {
    UpdateRequest req = new UpdateRequest();
    req.add(doc);
    req.setCommitWithin(commitWithinMs);
    return req.process(this);
  }
{code}

Since both ContentStreamUpdateRequest and UpdateRequest are extensions of AbstractUpdateRequest,
it is perfectly reasonable to continue to use ContentStreamUpdateRequest instead of trying
to force everything into SolrInputDocument.  And that way, the problem is effectively solved.
 The only thing you'd want to do is research the differences between UpdateRequest and ContentStreamUpdateRequest
to be sure that we pick the same target URL.


> Solr Connector - classic Solrj SolrInputDocument support
> --------------------------------------------------------
>
>                 Key: CONNECTORS-981
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-981
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Alessandro Benedetti
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: CONNECTORS-981.patch
>
>
> The solr connector, according with the development of the Tika Connector processor, should
be able to operate in 2 ways :
> 1) as usual
> 2) using the classic Solrj SolrInputDocument approach with already extracted metadata
> To allow the choice a flag will be added in the UI in the mapping tab ( as it's related
with how the fields will be processed)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message