lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Khludnev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-9647) CollectionsAPIDistributedZkTest got stuck, reproduces failure
Date Wed, 19 Oct 2016 14:04:58 GMT

     [ https://issues.apache.org/jira/browse/SOLR-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Khludnev updated SOLR-9647:
-----------------------------------
    Attachment: SOLR-9647.patch

[^SOLR-9647.patch] addresses a case when   CollectionsAPIDistributedZkTest.testCollectionsAPIAddRemoveStress()
tries to spawn too many cores, it hangs and flood heap up to OOME. The reasons are: 
* when many cores register at mbean server, all of them hangs on some synchronized policy
check inside jmx.
* default version buckets are huge by default, but that method even doesn't send updates.

This patch introduces {{solrconfig-slim.xml}} in {{stressconf}} cloud configset without jmx
and with trimmed version buckets. It doesn't address the speculations in the comment above.

One more change has been required: there is a code branch: pick up the only existing configSet
if there is no one specified explicitly. But testCollectionsAPIAddRemoveStress now requires
an alternative configSet that's why it's skipped with %50 prob. 

Is it worth to commit? 

> CollectionsAPIDistributedZkTest got stuck, reproduces failure
> -------------------------------------------------------------
>
>                 Key: SOLR-9647
>                 URL: https://issues.apache.org/jira/browse/SOLR-9647
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mikhail Khludnev
>         Attachments: SOLR-9647.patch
>
>
>  I have to shoot https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1129/
just because "Took 1 day 12 hr on lucene".
>    [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:08:30, stalled for
48990s at: CollectionsAPIDistributedZkTest.test
>    [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:09:30, stalled for
49050s at: CollectionsAPIDistributedZkTest.test
>  It's just got stuck. Then I run it locally, it passes from Eclipse, but fails when I
run from cmd>ant. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message