lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-906) Buffered / Streaming SolrServer implementaion
Date Sun, 04 Jan 2009 14:01:45 GMT

    [ https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660572#action_12660572
] 

Noble Paul commented on SOLR-906:
---------------------------------

Please ignore the number 40K docs. I just took it from your perf test numbers. I thought you
were writing docs as a list

I am referring to the client code .The method in UpdateRequest
{code}
public Collection<ContentStream> getContentStreams() throws IOException {
    return ClientUtils.toContentStreams( getXML(), ClientUtils.TEXT_XML );
}
{code}

This means that the getXML() method actually constructs a huge String which is the entire
xml. It is not very good if we are writing out very large no:of docs

I am suggesting that ComonsHttpSolrServer has scope for improvement. Instead of building that
String in memory  we can just start streaming it to the server. So the OutputStream can be
passed on to UpdateRequest so that it can write the xml right into the stream. So there is
streaming effectively on both ends

This is valid where users do bulk updates. Not when they write one doc at a time. 

The new method SolrServer#add(Iterator<SolrInputDocs> docs) can start writing the docs
immedietly and the docs can be uploaded as and when they are being produced. It is not related
to these issue exactly, But the intend of this issue is to make upload faster.


SOLR-865 is not very related to this issue. StreamingHttpSolrServer can use javabin format
as well.

bq.with the StreamingHttpSolrServer, you can send documents one at a time and each documents
starts sending as soon as it can
One drawback of a StreamingHttpSolrServer is that it ends up opening  multiple connections
for uploading the documents

Another enhancement . We can add one (or more ) extra thread in the server to do the call
updaterequestprocessor.processAdd() . 

> Buffered / Streaming SolrServer implementaion
> ---------------------------------------------
>
>                 Key: SOLR-906
>                 URL: https://issues.apache.org/jira/browse/SOLR-906
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java
>            Reporter: Ryan McKinley
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: SOLR-906-StreamingHttpSolrServer.patch, SOLR-906-StreamingHttpSolrServer.patch,
SOLR-906-StreamingHttpSolrServer.patch, SOLR-906-StreamingHttpSolrServer.patch, StreamingHttpSolrServer.java
>
>
> While indexing lots of documents, the CommonsHttpSolrServer add( SolrInputDocument )
is less then optimal.  This makes a new request for each document.
> With a "StreamingHttpSolrServer", documents are buffered and then written to a single
open Http connection.
> For related discussion see:
> http://www.nabble.com/solr-performance-tt9055437.html#a20833680

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message