Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6AF6BE17 for ; Mon, 2 Jan 2012 20:04:11 +0000 (UTC) Received: (qmail 99615 invoked by uid 500); 2 Jan 2012 20:04:09 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 99573 invoked by uid 500); 2 Jan 2012 20:04:09 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 99565 invoked by uid 99); 2 Jan 2012 20:04:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jan 2012 20:04:09 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of simon.willnauer@googlemail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vx0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jan 2012 20:04:05 +0000 Received: by vcbfl13 with SMTP id fl13so17340984vcb.35 for ; Mon, 02 Jan 2012 12:03:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; bh=z06XcNA2agY9wpsXVxO60+auTV/AbKtx+CB44lYTtMM=; b=wUO/JNtU72LSIrC9udP3gzIshA21k9zDh3wE/Nn1LvJdxLd2Id7JH2nmY5UiVQ/tPt 4wNuq6Fw+Ndg2Rs3COSmCp30kia8MIeJdETMK+p0gJGea/ruQQBZbH0sOhBL6b4Ik0G0 Jbl0eqk/atlXrgfwZxG2/T9WxzVYqwoVY6/rs= MIME-Version: 1.0 Received: by 10.220.148.133 with SMTP id p5mr28138293vcv.32.1325534624191; Mon, 02 Jan 2012 12:03:44 -0800 (PST) Received: by 10.52.174.72 with HTTP; Mon, 2 Jan 2012 12:03:44 -0800 (PST) Reply-To: simon.willnauer@gmail.com In-Reply-To: References: Date: Mon, 2 Jan 2012 21:03:44 +0100 Message-ID: Subject: Re: Help running out of files From: Simon Willnauer To: java-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable hey charlie, there are a couple of wrong assumptions in your last email mostly related to merging. mergefactor =3D 10 doesn't mean that you are ending up with one file neither is it related to files. Yet, my first guess is that you are using CompoundFileSystem (CFS) so each segment corresponds to a single file. The merge factor relates to segments and is responsible for triggering segment merges by their size (either in bytes or in documents). For more details see this blog: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.h= tml If you are using CFS one segment is one file. In 3.1 CFS is only used if the target segment is less than the nonCFSRatio. That prevents the usage of CFS for segments that are bigger than a fraction of the existing index to be packed into CFS (by default 0.1 -> 10%) this means your index might create non-cfs segments with multiple files (10 in the worst case.... maybe I missed one but anyway...) which means the number of open files increases. This is only a guess since I don't know what you are doing with your index readers etc. Which platform are you one and what is the file descriptor limit? In general its ok to raise the FD limit on your OS and just let lucene do its job. if you are restricted in any way you can set the LogMergePolicy#setNoCFSRatio(double) to 1.0 and see you your are still seeing the problem. About commit vs. close - in general its not a good idea to close your IW at all. I'd keep it open as long as you can and commit if needed. Even optimize is somewhat overrated and should be used with care or not at all... (here is another writeup regarding optimize: http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-fo= r-you ) hope that helps, simon On Mon, Jan 2, 2012 at 5:38 PM, Charlie Hubbard wrote: > I'm beginning to think there is an issue with 3.1 that's causing this. > =C2=A0After looking over my code again I forgot that the mechanism that d= oes the > indexing hasn't changed, and the index IS being closed between cycles. > =C2=A0Even when using push vs pull. =C2=A0This code used to work on 2.x l= ucene, but I > had to upgrade it. =C2=A0It had been very stable under 2.x, but after upg= rading > to 3.1 I've started seeing this problem. =C2=A0I double checked the code = doing > the indexing, and it hasn't changed since I upgraded to 3.1. =C2=A0So the > constant in this equation is mostly my code. =C2=A0What's different is 3.= 1. > =C2=A0Furthermore, when new documents are pulled in through the > old mechanism the open file count continues to rise. =C2=A0Over a 24 hour= s > period it's grown by +296 files, but only 10 or 12 documents indexed. > > So is this a known issue? =C2=A0Should I upgrade to newer version to fix = this? > > Thanks > Charlie > > On Sat, Dec 31, 2011 at 1:01 AM, Charlie Hubbard > wrote: > >> I have a program I recently converted from a pull scheme to a push schem= e. >> =C2=A0So previously I was pulling down the documents I was indexing, and= when I >> was done I'd close the IndexWriter at the end of each iteration. =C2=A0N= ow that >> I've converted to a push scheme I'm sent the documents to index, and I >> write them. =C2=A0However, this means I'm not closing the IndexWriter si= nce >> closing after every document would have poor performance. =C2=A0Instead = I'm >> keeping the IndexWriter open all the time. =C2=A0Problem is after a whil= e the >> number of open files continues to rise. =C2=A0I've set the following par= ameters >> on the IndexWriter: >> >> merge.factor=3D10 >> max.buffered.docs=3D1000 >> >> After going over the api docs I thought this would mean it'd never creat= e >> more than 10 files before merging those files into a single file, but it= 's >> creating 100's of files. =C2=A0Since I'm not closing the IndexWriter wil= l it >> merge the files? =C2=A0From reading the API docs it sounded like merging= happens >> regardless of flushing, commit, or close. =C2=A0Is that true? =C2=A0I've= measured the >> files that are increasing, and it's files associated with this one index >> I'm leaving open. =C2=A0I have another index that I do close periodicall= y, and >> its not growing like this one. >> >> I've read some posts about using commit() instead of close() in situatio= ns >> like this because its faster performance. =C2=A0However, commit() just f= lushes >> to disk rather than flushing and optimizing like close(). =C2=A0Not sure >> commit() is what I need or not. =C2=A0Any suggestions? >> >> Thanks >> Charlie >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org