Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C7D2C18992 for ; Thu, 12 Nov 2015 12:28:40 +0000 (UTC) Received: (qmail 60651 invoked by uid 500); 12 Nov 2015 12:28:39 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 60604 invoked by uid 500); 12 Nov 2015 12:28:39 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 60592 invoked by uid 99); 12 Nov 2015 12:28:38 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2015 12:28:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 34E751A2FD9 for ; Thu, 12 Nov 2015 12:28:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id cZ32onEYvS4a for ; Thu, 12 Nov 2015 12:28:28 +0000 (UTC) Received: from mail-wm0-f46.google.com (mail-wm0-f46.google.com [74.125.82.46]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 0C2DA439CD for ; Thu, 12 Nov 2015 12:28:28 +0000 (UTC) Received: by wmww144 with SMTP id w144so86108973wmw.0 for ; Thu, 12 Nov 2015 04:28:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=PVgr4WSctu7lfxiGC5Vk0Bw2cDXO6LXd/MEjOhiLd8o=; b=Up/Qwdv9D8QXMS4S6hFwQb6WsHKi4rUUJKVgvqtFhGdnuONqQVo5POGgxiiC0a1AG+ X0/lcl4shfAyh1maMMrZ3XhCDFFGaraNlaoSm3+ekWfCCq3w11HfBSYuFqcX+r3JMF8w uXGhENyXQycRkIDVnpVemPn6xpxQchwtqZjO/ApfOcI+mhHv4fH9JM3IS65JY8ZPywy3 2B1awYhux/5ZgzXh2E6UY12qLedaAwmt5GjnOi/Mycjh2aoER7ve5XoNbT60MbhkkfoH B3ONzEhsLUqYgeubdGueUYOrU4MUj8fjrt2hDGMLXZwdHuTFC10KJTWAyQrTTnKPqvwA ++zw== MIME-Version: 1.0 X-Received: by 10.194.205.162 with SMTP id lh2mr15524621wjc.61.1447331307152; Thu, 12 Nov 2015 04:28:27 -0800 (PST) Received: by 10.28.232.17 with HTTP; Thu, 12 Nov 2015 04:28:27 -0800 (PST) In-Reply-To: References: Date: Thu, 12 Nov 2015 13:28:27 +0100 Message-ID: Subject: Re: debugging growing index size From: Rob Audenaerde To: "java-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=047d7b873a546f6d3d052457143a --047d7b873a546f6d3d052457143a Content-Type: text/plain; charset=UTF-8 Curious indeed! I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate the logs. Will get back with them in a day hopefully. Thanks for the extra logging! -Rob On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless < lucene@mikemccandless.com> wrote: > Hmm, curious. > > I looked at the [large] infoStream output and I see segment _3ou7 > present on init of IW, a few getReader calls referencing it, then a > forceMerge that indeed merges it away, yet I do NOT see IW attempting > deletion of its files. > > And indeed I see plenty (too many: many times per second?) of commits > after that, so the index itself is no longer referencing _3ou7. > > If you are failing to close all NRT readers then I would expect _3ou7 > to be in the lsof output, but it's not. > > The NRT readers close method has logic that notifies IndexWriter when > it's done "needing" the files, to emulate "delete on last close" > semantics for filesystems like HDFS that don't do that ... it's > possible something is wrong here. > > Can you set the (public, static) boolean > IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this > log? This causes IW to log the ref count of each file it's tracking > ... > > I'll also add a bit more verbosity to IW when NRT readers are opened > and close, for 5.4.0. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde > wrote: > > Hi all, > > > > I'm still debugging the growing-index size. I think closing index readers > > might help (work in progress), but I can't really see them holding on to > > files (at least, using lsof ). Restarting the application sheds some > light, > > I see logging on files that are no longer referenced. > > > > What I see is that there are files in the index-directory, that seem to > > longer referenced.. > > > > I put the output of the infoStream online, because is it rather big (30MB > > gzipped): http://www.audenaerde.org/lucene/merges.log.gz > > > > Output of lsof: (executed 'sudo lsof *' in the index directory ). This > is > > on an CentOS box (maybe that influences stuff as well?) > > > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > > java 30581 apache mem REG 253,0 3176094924 18880508 > > _4gs5_Lucene50_0.dvd > > java 30581 apache mem REG 253,0 505758610 18880546 _4gs5.fdt > > java 30581 apache mem REG 253,0 369563337 18880631 > > _4gs5_Lucene50_0.tim > > java 30581 apache mem REG 253,0 176344058 18880623 > > _4gs5_Lucene50_0.pos > > java 30581 apache mem REG 253,0 378055201 18880606 > > _4gs5_Lucene50_0.doc > > java 30581 apache mem REG 253,0 372579599 18880400 > > _4i5a_Lucene50_0.dvd > > java 30581 apache mem REG 253,0 82017447 18880748 _4g37.cfs > > java 30581 apache mem REG 253,0 85376507 18880721 _4fb3.cfs > > java 30581 apache mem REG 253,0 363493917 18880533 > > _4ct1_Lucene50_0.dvd > > java 30581 apache mem REG 253,0 9421892 18880806 _4gjc.cfs > > java 30581 apache mem REG 253,0 76877461 18880553 _4ct1.fdt > > java 30581 apache mem REG 253,0 46271330 18880661 > > _4ct1_Lucene50_0.tim > > java 30581 apache mem REG 253,0 26911387 18880653 > > _4ct1_Lucene50_0.pos > > java 30581 apache mem REG 253,0 54678249 18880568 > > _4ct1_Lucene50_0.doc > > java 30581 apache mem REG 253,0 76556587 18880328 _4i5a.fdt > > java 30581 apache mem REG 253,0 45032159 18880389 > > _4i5a_Lucene50_0.tim > > java 30581 apache mem REG 253,0 26486772 18880388 > > _4i5a_Lucene50_0.pos > > java 30581 apache mem REG 253,0 55411002 18880362 > > _4i5a_Lucene50_0.doc > > java 30581 apache mem REG 253,0 70484185 18880340 _4hkn.cfs > > java 30581 apache mem REG 253,0 10873921 18880324 _4gpz.cfs > > java 30581 apache mem REG 253,0 17230506 18880524 _4i11.cfs > > java 30581 apache mem REG 253,0 6706969 18880575 _4i0t.cfs > > java 30581 apache mem REG 253,0 15135578 18880624 _4i0i.cfs > > java 30581 apache mem REG 253,0 15368310 18880717 _4hzp.cfs > > java 30581 apache mem REG 253,0 5146140 18880583 _4hze.cfs > > java 30581 apache mem REG 253,0 2917380 18880411 _4gs5.nvd > > java 30581 apache mem REG 253,0 6871469 18880732 _4hod.cfs > > java 30581 apache mem REG 253,0 2860341 18880495 _4i84.cfs > > java 30581 apache mem REG 253,0 835726 18880660 _4i7z.cfs > > java 30581 apache mem REG 253,0 1005595 18880648 _4i7w.cfs > > java 30581 apache mem REG 253,0 5639672 18880401 _4i4o.cfs > > java 30581 apache mem REG 253,0 4388371 18880440 _4i4a.cfs > > java 30581 apache mem REG 253,0 1151845 18880512 _4i7v.cfs > > java 30581 apache mem REG 253,0 941773 18880613 _4i7x.cfs > > java 30581 apache mem REG 253,0 984023 18880588 _4i7o.cfs > > java 30581 apache mem REG 253,0 1790005 18880619 _4i7y.cfs > > java 30581 apache mem REG 253,0 466371 18880515 _4ct1.nvd > > java 30581 apache mem REG 253,0 723280 18880573 _4i7q.cfs > > java 30581 apache mem REG 253,0 806289 18880517 _4i7h.cfs > > java 30581 apache mem REG 253,0 17362 18880520 _4i9s.cfs > > java 30581 apache mem REG 253,0 698362 18880531 _4i9r.cfs > > java 30581 apache mem REG 253,0 483215 18880406 _4i5a.nvd > > java 30581 apache mem REG 253,0 14110 18880416 _4i9v.cfs > > java 30581 apache mem REG 253,0 6121 18880412 _4i9t.cfs > > java 30581 apache 30wW REG 253,0 0 18877901 write.lock > > > > Output of some of the biggest files in the index directory: > > > > -rw-r--r--. 1 apache apache 358684577 Nov 11 08:04 _4fjn.cfs > > -rw-r--r--. 1 apache apache 363493917 Nov 11 07:54 _4ct1_Lucene50_0.dvd > > -rw-r--r--. 1 apache apache 369563337 Nov 11 08:06 _4gs5_Lucene50_0.tim > > -rw-r--r--. 1 apache apache 372579599 Nov 11 08:09 _4i5a_Lucene50_0.dvd > > -rw-r--r--. 1 apache apache 378055201 Nov 11 08:06 _4gs5_Lucene50_0.doc > > -rw-r--r--. 1 apache apache 427401813 Nov 10 08:14 _3ou7.cfs > > -rw-r--r--. 1 apache apache 505758610 Nov 11 08:04 _4gs5.fdt > > -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55 _3k3a_Lucene50_0.dvd > > -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10 _4gs5_Lucene50_0.dvd > > > > Note that the 3ou7 and 3k3a segments no longer appear to be in use? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --047d7b873a546f6d3d052457143a--