lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud
Date Thu, 01 Aug 2013 20:25:49 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726831#comment-13726831
] 

Erick Erickson commented on SOLR-5081:
--------------------------------------

Yeah, that is odd. The stack traces you sent basically showed no deadlocks, nothing interesting
at all. I suspect pursuing whether anything is getting to Solr or not is a good idea....

Hmmmm, blunt-instrument test when the cluster is hung. What happens if you, say, submit a
query directly to one of the nodes? Does it respond or do you see anything in the solr log
on that node? Tip: adding &distrib=false to the _query_ will not try to send sub-queries
to other shards.

And I wonder what happens if you, say, use post.jar (comes with the example) to try to send
a doc to Solr when it's hung, anything?

Clearly I'm grasping at straws here, but I'm kind of out of good ideas.
                
> Highly parallel document insertion hangs SolrCloud
> --------------------------------------------------
>
>                 Key: SOLR-5081
>                 URL: https://issues.apache.org/jira/browse/SOLR-5081
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.3.1
>            Reporter: Mike Schrag
>         Attachments: threads.txt
>
>
> If I do a highly parallel document load using a Hadoop cluster into an 18 node solrcloud
cluster, I can deadlock solr every time.
> The ulimits on the nodes are:
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 1031181
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 32768
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 515590
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> The open file count is only around 4000 when this happens.
> If I bounce all the servers, things start working again, which makes me think this is
Solr and not ZK.
> I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message