www-infrastructure-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henri Yandell (JIRA)" <j...@apache.org>
Subject [jira] Closed: (INFRA-1334) Intermittent Out Of Memory failures
Date Thu, 11 Oct 2007 02:19:51 GMT

     [ https://issues.apache.org/jira/browse/INFRA-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Henri Yandell closed INFRA-1334.
--------------------------------

    Resolution: Fixed

All dealt with - INFRA-1381 records the need to patch the rpc plugin.

> Intermittent Out Of Memory failures
> -----------------------------------
>
>                 Key: INFRA-1334
>                 URL: https://issues.apache.org/jira/browse/INFRA-1334
>             Project: Infrastructure
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: JIRA
>            Reporter: Ted Husted
>            Assignee: Henri Yandell
>
> [Summarized from an email thread]
> Occasionally, JIRA on Brutus will fail. Sometimes, it seems to "spin out of control",
knocking  off other systems, including Bugzilla. Though, other times, it's been observed that
the other JIRA instances are responsive even when one fails. 
> It's possible that we are seeing both OOMs and PermGen OOMs. The most recent set were
due to some kind of client tool sending a malformed request that ended up requesting huge
amounts of data, and consuming large amounts of memory. It's not clear whether the memory
usage is a leak or because something that isn't coded to pipeline.
> Client tools account for half of the current JIRA traffic.
>  % wc -l issues_weblog 
>  738289
>  % grep -c soapservice issues_weblog 
>  372529
> The underlying problem may be that ill-conceived search requests can consume too much
memory and bring the system down. (For other services, like Subversion, we've found ways to
keep the CPUs in check.)
> These requests are being made by tools like mylyn, which some developers find very useful.
The tools increase integration between development environments and the issue tracker. A page
like this
>  * http://velocity.apache.org/engine/releases/velocity-1.5/jira-report.html
> is being built straight from the issue tracker.
> It is important that we take steps, since we can't trust a system that's starting to
throw OOMs, since it could potentially lead to bad data.
> Since there seems to be more than one type of error condition, whenever a JIRA instance
goes down, we should check is whether other instances are responsive. We need, for example,
to isolate issues with the httpd front end. 
> We might want to look at setting up the Java Service Wrapper (http://wrapper.tanukisoftware.org)
for our JVM(s) (its available for both Linux and Solaris). One thing the wrapper can do is
monitor the JVM output for such things as OOM messages. Having JWS shut the JVM down and send
a mail might help us isolate the problem. If possible, it might be helpful if the wrapper
could monitoring JMX events within the monitored JVM, so that the Wrapper can take pro-active
action.
> The JIRA searches are based on Lucene. We might want to consider asking one of our ASF
Lucene experts to review the Atlassian source code. 
> Aside from searches, another cause could be something like DNS resolution via InetAddress?
This is a known issue with Java's DNS cache that can cause memory leaks for each IP that hit
the server. Any server application that does DNS resolution should be using dnsjava, which
has a proper TTL-managed cache. There is also a JVM specific workaround that can be used in
the meantime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message