lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9824) Documents indexed in bulk are replicated using too many HTTP requests
Date Fri, 09 Dec 2016 13:14:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735278#comment-15735278
] 

David Smiley commented on SOLR-9824:
------------------------------------

Ok I get we want a flag to articulate that we want long-running threads.  It's really long-running
_connections_ specifically what we want; it sounds clearer to say it that way IMO.  Maybe
longLastingThreads/Connections could be synonymous with a poll time of MAX_VALUE?
Some other review comments:
* longLastingThreads (perhaps better renamed longLastingConnections) should be final.
* Both places where poll() is invoked, has the same surrounding try-catch InterruptedException
logic; it could be extracted to a method that returns the Update (possibly null).
** say what's the harm in removing the {{if !longLastingThreads}} so that it doesn't propagate;
it just finishes the connection. That sounds like the right behavior to do always.
* I presume you're going to effectively revert SOLR-7333 and all related changes?

RE Futures: I get what you're saying but I think it introduces needless complexity.  We're
conditionally creating a Map, which needs to be populated and get items removed under the
safety synchronized.  Runners need to be submitted differently to get Futures.  I think that's
much more complex than saving the current Thread to a field on the Runner.

bq. (RE interruption) It's fine - in our case we know all the outstanding documents have been
added to the queue by the time we are in the blockuntilfinished block. We don't access CUSS
in a multi threaded manner. Once we are in blockUntilFinished and the queue is empty, we know
we are just interrupting poll.

The spot where blockUntilFinished triggers interruption (via Future.cancel) indeed only occurs
when the queue is empty (and we can be sure of that, and that no new docs will get added.
 But the Runner could be in the middle of sending the last document it got.  I have no idea
if somewhere internal to doing that, it might cause the sending of that document to not send
and/or throw.

> Documents indexed in bulk are replicated using too many HTTP requests
> ---------------------------------------------------------------------
>
>                 Key: SOLR-9824
>                 URL: https://issues.apache.org/jira/browse/SOLR-9824
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 6.3
>            Reporter: David Smiley
>         Attachments: SOLR-9824.patch, SOLR-9824.patch, SOLR-9824.patch
>
>
> This takes awhile to explain; bear with me. While working on bulk indexing small documents,
I looked at the logs of my SolrCloud nodes.  I noticed that shards would see an /update log
message every ~6ms which is *way* too much.  These are requests from one shard (that isn't
a leader/replica for these docs but the recipient from my client) to the target shard leader
(no additional replicas).  One might ask why I'm not sending docs to the right shard in the
first place; I have a reason but it's besides the point -- there's a real Solr perf problem
here and this probably applies equally to replicationFactor>1 situations too.  I could
turn off the logs but that would hide useful stuff, and it's disconcerting to me that so many
short-lived HTTP requests are happening, somehow at the bequest of DistributedUpdateProcessor.
 After lots of analysis and debugging and hair pulling, I finally figured it out.  
> In SOLR-7333 ([~tpot]) introduced an optimization called {{UpdateRequest.isLastDocInBatch()}}
in which ConcurrentUpdateSolrClient will poll with a '0' timeout to the internal queue, so
that it can close the connection without it hanging around any longer than needed.  This part
makes sense to me.  Currently the only spot that has the smarts to set this flag is {{JavaBinUpdateRequestCodec.unmarshal.readOuterMostDocIterator()}}
at the last document.  So if a shard received docs in a javabin stream (but not other formats)
one would expect the _last_ document to have this flag.  There's even a test.  Docs without
this flag get the default poll time; for javabin it's 25ms.  Okay.
> I _suspect_ that if someone used CloudSolrClient or HttpSolrClient to send javabin data
in a batch, the intended efficiencies of SOLR-7333 would apply.  I didn't try. In my case,
I'm using ConcurrentUpdateSolrClient (and BTW DistributedUpdateProcessor uses CUSC too). 
CUSC uses the RequestWriter (defaulting to javabin) to send each document separately without
any leading marker or trailing marker.  For the XML format by comparison, there is a leading
and trailing marker (<stream> ... </stream>).  Since there's no outer container
for the javabin unmarshalling to detect the last document, it marks _every_ document as {{req.lastDocInBatch()}}!
 Ouch!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message