Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AC4CD200C77 for ; Mon, 1 May 2017 16:16:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id AB043160BAE; Mon, 1 May 2017 14:16:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 06011160BAB for ; Mon, 1 May 2017 16:16:22 +0200 (CEST) Received: (qmail 69321 invoked by uid 500); 1 May 2017 14:16:20 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 69307 invoked by uid 99); 1 May 2017 14:16:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 May 2017 14:16:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 3A3241A0E13 for ; Mon, 1 May 2017 14:16:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.103 X-Spam-Level: X-Spam-Status: No, score=-0.103 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=elyograg.org Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Uc9ippIRYB5a for ; Mon, 1 May 2017 14:16:11 +0000 (UTC) Received: from frodo.elyograg.org (frodo.elyograg.org [166.70.79.219]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A49B65F4A7 for ; Mon, 1 May 2017 14:16:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by frodo.elyograg.org (Postfix) with ESMTP id 634BCBC6 for ; Mon, 1 May 2017 08:16:04 -0600 (MDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=elyograg.org; h= content-transfer-encoding:content-type:content-type:in-reply-to :mime-version:user-agent:date:date:message-id:from:from :references:subject:subject:received:received; s=mail; t= 1493648164; bh=Yvib6ZspYytliNScvhCjf61bIl4PNIBjMhF/x/N4I2M=; b=n sM2hMAGl/kGxFq+jYUPJtrn0spO9TvB0P8KrlIe05IZWyOdN6ej+52SSFtU1uhP9 LChkcb8DwE8H0VdtBofLJF5Km2WhrIfTMg0FrxruXa8Ir6kGoxgc4J/uNSQZKK0p IEquqk4v5/fVfdIiFdjl9jbHwzuLyjuyr5c7Knvka4= X-Virus-Scanned: Debian amavisd-new at frodo.elyograg.org Received: from frodo.elyograg.org ([127.0.0.1]) by localhost (frodo.elyograg.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jtVHuSkyrTyv for ; Mon, 1 May 2017 08:16:04 -0600 (MDT) Received: from [10.2.0.108] (client175.mainstreamdata.com [209.63.42.175]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: elyograg@elyograg.org) by frodo.elyograg.org (Postfix) with ESMTPSA id 344E1B68 for ; Mon, 1 May 2017 08:16:01 -0600 (MDT) Subject: Re: Troubleshooting solr errors To: solr-user@lucene.apache.org References: From: Shawn Heisey Message-ID: Date: Mon, 1 May 2017 08:15:50 -0600 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit archived-at: Mon, 01 May 2017 14:16:23 -0000 On 4/25/2017 12:05 PM, Daniel Miller wrote: > The problem isn't a particular email message - I get a cascade of > those errors (every time a new message is received) once the server > "breaks". The fix is to restart the server. I did find a Java heap > error in the log - so I've increased the memory allocation (now to > -Xms512m -Xmx2048m). I had thought that a heap failure would result > in "simple" termination - and that systemd would restart it > appropriately - but obviously I'm missing something. Erick covered some of this already: The init script that the service installer script installs on a non-windows system can start Solr, but it will not automatically restart it if it dies. That would require you to write something special, probably a very custom systemd service specification, rather than use the init script. Automatically restarting on death is not a good idea -- it is VERY likely that whatever caused the death is going to happen again. Another detail, at least on non-windows systems, is that recent Solr versions include a script that kills the process on OutOfMemoryError (OOME). This is done because program operation is completely unpredictable after that error occurs -- we have no way of knowing what Solr will do. There's an issue in Jira to add OOME killing to the Windows script. FYI, the stacktrace from an OutOfMemoryError regarding the heap is highly unlikely to give you anything useful about why the process ran out of memory, since *any* memory allocation in any software running in the JVM can trigger the error. Other errors besides OOME should never terminate Solr unless there's an enormous bug somewhere. That bug might be in Java itself, or even the OS. Thanks, Shawn