lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-906) Buffered / Streaming SolrServer implementaion
Date Thu, 11 Dec 2008 00:26:44 GMT

    [ https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655438#action_12655438
] 

Ryan McKinley commented on SOLR-906:
------------------------------------

> how much of the 3.5minutes -> 30seconds is due to the logging?

~1 min.  When I turn off logging completely, the time is ~2.5 mins  (also, note that with
3 threads, it is down to 20sec)

RE: calling add( doc ) vs add( List<doc> )...  
yes, things are much better if you call add( List<doc> ) however, it is not the most
convenient api if you are running though tons of things.

I would expect (but have not tried) adding 40K docs in one call to add( List<doc> )
would have the same time as this StreamingHttpSolrServer.  It is probably also similar if
you buffer 100? 1,000? at a time, but I have not tried.

The StreamingHttpSolrServer essentially handles the buffering for you.  It keeps an http connection
open as long as the Queue has docs to send.  It can start multiple threads and drain the same
Queue simultaneously.

Essentially, this just offers an easier interface to get the best possible performance.  The
trade off (for now) is that there is no good error reporting.



> Buffered / Streaming SolrServer implementaion
> ---------------------------------------------
>
>                 Key: SOLR-906
>                 URL: https://issues.apache.org/jira/browse/SOLR-906
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java
>            Reporter: Ryan McKinley
>             Fix For: 1.4
>
>         Attachments: SOLR-906-StreamingHttpSolrServer.patch, StreamingHttpSolrServer.java
>
>
> While indexing lots of documents, the CommonsHttpSolrServer add( SolrInputDocument )
is less then optimal.  This makes a new request for each document.
> With a "StreamingHttpSolrServer", documents are buffered and then written to a single
open Http connection.
> For related discussion see:
> http://www.nabble.com/solr-performance-tt9055437.html#a20833680

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message