lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3685) solrcloud crashes on startup due to excessive memory consumption
Date Mon, 30 Jul 2012 10:08:34 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424785#comment-13424785
] 

Markus Jelsma commented on SOLR-3685:
-------------------------------------

Hi,

1. Yes, but we allow only one searcher at the same time to be warmed. This resource usage
also belongs to the Java heap, it cannot cause 5x as much heap being allocated.

2. Yes, i'll open a new issue and refer to this.

3. Well, in some logs i clearly see a core is attempting to download and judging from the
multiple index directories it's true. I am very sure no updates have been added to the cluster
for a long time yet it still attempts to recover. Below is a core recovering.

{code}
2012-07-30 09:48:36,970 INFO [solr.cloud.ZkController] - [main] - : We are http://nl2.index.openindex.io:8080/solr/openindex_a/
and leader is http://nl1.index.openindex.io:8080/solr/openindex_a/
2012-07-30 09:48:36,970 INFO [solr.cloud.ZkController] - [main] - : No LogReplay needed for
core=openindex_a baseURL=http://nl2.index.openindex.io:8080/solr
2012-07-30 09:48:36,970 INFO [solr.cloud.ZkController] - [main] - : Core needs to recover:openindex_a
{code}

Something noteworthy may be that for some reasons the index versions of all cores and their
replica's don't match. After a restart the generation of a core is also different while it
shouldn't have changed. The size in bytes is also slightly different (~20 bytes).

The main thing that's concerning that Solr consumes 5x the allocated heap space in the RESident
memory. Caches and such are in the heap and the MMapped index dir should be in VIRTual memory
and not cause the kernel to kill the process. I'm not yet sure what's going on here. Also,
according to Uwe virtual memory should not be more than 2-3 times index size. In our case
we see ~800Mb virtual memory for two 26Mb cores right after start up.

We have only allocated 98Mb to the heap for now and this is enough for such a small index.
                
> solrcloud crashes on startup due to excessive memory consumption
> ----------------------------------------------------------------
>
>                 Key: SOLR-3685
>                 URL: https://issues.apache.org/jira/browse/SOLR-3685
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java), SolrCloud
>    Affects Versions: 4.0-ALPHA
>         Environment: Debian GNU/Linux Squeeze 64bit
> Solr 5.0-SNAPSHOT 1365667M - markus - 2012-07-25 19:09:43
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 4.1
>
>         Attachments: info.log
>
>
> There's a serious problem with restarting nodes, not cleaning old or unused index directories
and sudden replication and Java being killed by the OS due to excessive memory allocation.
Since SOLR-1781 was fixed index directories get cleaned up when a node is being restarted
cleanly, however, old or unused index directories still pile up if Solr crashes or is being
killed by the OS, happening here.
> We have a six-node 64-bit Linux test cluster with each node having two shards. There's
512MB RAM available and no swap. Each index is roughly 27MB so about 50MB per node, this fits
easily and works fine. However, if a node is being restarted, Solr will consistently crash
because it immediately eats up all RAM. If swap is enabled Solr will eat an additional few
100MB's right after start up.
> This cannot be solved by restarting Solr, it will just crash again and leave index directories
in place until the disk is full. The only way i can restart a node safely is to delete the
index directories and have it replicate from another node. If i then restart the node it will
crash almost consistently.
> I'll attach a log of one of the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message