Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 2444 invoked from network); 29 Nov 2009 11:10:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Nov 2009 11:10:25 -0000 Received: (qmail 25052 invoked by uid 500); 29 Nov 2009 11:10:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 24950 invoked by uid 500); 29 Nov 2009 11:10:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 24940 invoked by uid 99); 29 Nov 2009 11:10:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Nov 2009 11:10:23 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [209.85.211.185] (HELO mail-yw0-f185.google.com) (209.85.211.185) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Nov 2009 11:10:19 +0000 Received: by ywh15 with SMTP id 15so2387321ywh.5 for ; Sun, 29 Nov 2009 03:09:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.150.89.3 with SMTP id m3mr5003918ybb.186.1259492997466; Sun, 29 Nov 2009 03:09:57 -0800 (PST) In-Reply-To: <9ac0c6aa0911290257u2e40c1aew3c7971ab2a00a4a7@mail.gmail.com> References: <26485138.post@talk.nabble.com> <9ac0c6aa0911240159v34328d6x3abda02ac95b353a@mail.gmail.com> <26496505.post@talk.nabble.com> <9ac0c6aa0911240859i8de402fr77752035e633097f@mail.gmail.com> <26502131.post@talk.nabble.com> <9ac0c6aa0911241239y3cbf2487r78855d8d4a8f922c@mail.gmail.com> <26545384.post@talk.nabble.com> <9ac0c6aa0911271347u3db3e42waa74f001e393ea78@mail.gmail.com> <26556468.post@talk.nabble.com> <9ac0c6aa0911290257u2e40c1aew3c7971ab2a00a4a7@mail.gmail.com> Date: Sun, 29 Nov 2009 06:09:57 -0500 Message-ID: <9ac0c6aa0911290309w4b23de61ke42cf6719f6682a5@mail.gmail.com> Subject: Re: Searching while optimizing From: Michael McCandless To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable OK I opened https://issues.apache.org/jira/browse/LUCENE-2097 to track this= . Thanks v.sevel! Mike On Sun, Nov 29, 2009 at 5:57 AM, Michael McCandless wrote: > OK I dug down on this one... it's actually a bug in IndexWriter, when > used in near real-time mode *and* when CFS is enabled. =A0In that case, > internally IndexWriter holds open the wrong SegmentReader, thus tying > up more disk space than it should. > > Functionally, the bug is harmless -- it's just tying up disk space. > > I've boiled your example down to a test case. > > Thanks for catching & reporting this! I'll open an issue. > > If it's a problem, you can workaround the bug by either turning off > CFS, or, using IndexReader.open (& reopen) to get your reader, instead > of the near real-time writer. getReader() method. > > Mike > > On Sat, Nov 28, 2009 at 3:02 PM, vsevel wrote: >> >> Hi, thanks for the explanations. Though I had no luck... >> >> I now do the close of the reader before the commit. But still, only the >> close get us back to nominal. Here is the complete test: >> >> =A0 =A0@Test >> =A0 =A0public void optimize() throws Exception { >> =A0 =A0 =A0 =A0final File dir =3D new File("lucene_work/optimize"); >> =A0 =A0 =A0 =A0dir.mkdirs(); >> >> =A0 =A0 =A0 =A0for (File f : dir.listFiles()) { >> =A0 =A0 =A0 =A0 =A0 =A0f.delete(); >> =A0 =A0 =A0 =A0} >> >> =A0 =A0 =A0 =A0Assert.assertEquals(0, dir.listFiles().length); >> >> =A0 =A0 =A0 =A0Analyzer analyzer =3D new StandardAnalyzer(Version.LUCENE= _CURRENT); >> =A0 =A0 =A0 =A0MaxFieldLength maxLength =3D IndexWriter.MaxFieldLength.U= NLIMITED; >> =A0 =A0 =A0 =A0IndexWriter writer =3D new IndexWriter(FSDirectory.open(d= ir), >> analyzer, true, maxLength); >> =A0 =A0 =A0 =A0monitorIndexSize(dir); >> =A0 =A0 =A0 =A0long time =3D 2000; >> >> =A0 =A0 =A0 =A0log.info("writing..."); >> =A0 =A0 =A0 =A0for (int i =3D 0; i < 1000000; i++) { >> =A0 =A0 =A0 =A0 =A0 =A0Document doc =3D new Document(); >> =A0 =A0 =A0 =A0 =A0 =A0doc.add(new Field("foo", "bar " + i, Store.YES, >> Index.NOT_ANALYZED)); >> =A0 =A0 =A0 =A0 =A0 =A0writer.addDocument(doc); >> =A0 =A0 =A0 =A0} >> >> =A0 =A0 =A0 =A0writer.commit(); >> =A0 =A0 =A0 =A0log.info("done write"); >> =A0 =A0 =A0 =A0Thread.sleep(time); >> >> =A0 =A0 =A0 =A0log.info("opening reader..."); >> =A0 =A0 =A0 =A0IndexReader reader =3D writer.getReader(); >> =A0 =A0 =A0 =A0log.info("done open reader"); >> =A0 =A0 =A0 =A0Thread.sleep(time); >> >> =A0 =A0 =A0 =A0log.info("optimizing..."); >> =A0 =A0 =A0 =A0writer.optimize(); >> =A0 =A0 =A0 =A0log.info("done optimize"); >> =A0 =A0 =A0 =A0Thread.sleep(time); >> >> =A0 =A0 =A0 =A0log.info("closing reader..."); >> =A0 =A0 =A0 =A0reader.close(); >> =A0 =A0 =A0 =A0log.info("done reader close"); >> =A0 =A0 =A0 =A0Thread.sleep(time); >> >> =A0 =A0 =A0 =A0log.info("committing..."); >> =A0 =A0 =A0 =A0writer.commit(); >> =A0 =A0 =A0 =A0log.info("done commit"); >> =A0 =A0 =A0 =A0Thread.sleep(time); >> >> =A0 =A0 =A0 =A0log.info("closing writer..."); >> =A0 =A0 =A0 =A0writer.close(); >> =A0 =A0 =A0 =A0log.info("done writer close"); >> =A0 =A0 =A0 =A0Thread.sleep(time); >> =A0 =A0} >> >> And an exec log: >> >> 15:58:46,875 =A0INFO logserver.LuceneSystemTest =A0 =A0 writing... >> 15:58:46,875 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D0Mb >> 15:58:47,891 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D1Mb >> 15:58:48,891 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D3Mb >> 15:58:49,891 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D5Mb >> 15:58:50,906 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D8Mb >> 15:58:51,906 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D9Mb >> 15:58:52,906 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D12Mb >> 15:58:53,922 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D14Mb >> 15:58:54,984 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D15Mb >> 15:58:55,984 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D18Mb >> 15:58:56,984 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D20Mb >> 15:58:58,000 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D21Mb >> 15:58:59,000 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D25Mb >> 15:59:00,016 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D27Mb >> 15:59:01,016 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D29Mb >> 15:59:02,016 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D52Mb >> 15:59:03,031 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D52Mb >> 15:59:04,031 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D32Mb >> 15:59:04,328 =A0INFO logserver.LuceneSystemTest =A0 =A0 done write >> 15:59:05,031 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D32Mb >> 15:59:06,031 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D32Mb >> 15:59:06,328 =A0INFO logserver.LuceneSystemTest =A0 =A0 opening reader..= . >> 15:59:06,453 =A0INFO logserver.LuceneSystemTest =A0 =A0 done open reader >> 15:59:07,031 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D32Mb >> 15:59:08,031 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D32Mb >> 15:59:08,453 =A0INFO logserver.LuceneSystemTest =A0 =A0 optimizing... >> 15:59:09,047 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D34Mb >> 15:59:10,047 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D37Mb >> 15:59:11,047 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D40Mb >> 15:59:12,047 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D42Mb >> 15:59:12,391 =A0INFO logserver.LuceneSystemTest =A0 =A0 done optimize >> 15:59:13,062 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D55Mb >> 15:59:14,062 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D55Mb >> 15:59:14,391 =A0INFO logserver.LuceneSystemTest =A0 =A0 closing reader..= . >> 15:59:14,406 =A0INFO logserver.LuceneSystemTest =A0 =A0 done reader clos= e >> 15:59:15,062 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D55Mb >> 15:59:16,062 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D55Mb >> 15:59:16,406 =A0INFO logserver.LuceneSystemTest =A0 =A0 committing... >> 15:59:16,469 =A0INFO logserver.LuceneSystemTest =A0 =A0 done commit >> 15:59:17,062 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D43Mb >> 15:59:18,062 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D43Mb >> 15:59:18,469 =A0INFO logserver.LuceneSystemTest =A0 =A0 closing writer..= . >> 15:59:18,484 =A0INFO logserver.LuceneSystemTest =A0 =A0 done writer clos= e >> 15:59:19,062 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D32Mb >> 15:59:20,078 =A0INFO logserver.LuceneSystemTest =A0 =A0 size=3D32Mb >> >> I guess I would be able to do a close and reopen if really I need to. Bu= t if >> there is a nicer and more natural solution, I would love to know about i= t. >> >> thanks, >> vincent >> >> >> Michael McCandless-2 wrote: >>> >>> Phew, thanks for testing! =A0It's all explainable... >>> >>> When you have a reader open, it prevents the segments it had opened >>> from being deleted. >>> >>> When you close that reader, the segments could be deleted, however, >>> that won't happen until the writer next tries to delete, which it does >>> only periodically (eg, on flushing a new segment, committing a new >>> merge, etc.). >>> >>> Could you try closing your reader, then calling writer.commit() (which >>> is a no-op, since you had already committed, but it may tickle the >>> writer into attempting the deletions), and see if that frees up disk >>> space w/o closing? >>> >>> Mike >>> >>> On Fri, Nov 27, 2009 at 4:12 PM, vsevel wrot= e: >>>> I am starting my tests with an unoptimized 40Mb index. I have 3 test >>>> cases: >>>> 1) open a writer, optimize, commit, close >>>> 2) open a writer, open a reader from the writer, optimize, commit, clo= se >>>> 3) same as 2) except the reader is opened while the optimize is done i= n a >>>> different thread >>>> >>>> During all the tests, I monitor the size of the index on the disk. The >>>> results are: >>>> 1) initial=3D41Mb, before end of optimize=3D122Mb, after end of >>>> optimize=3D81Mb, >>>> after commit=3D40Mb, =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0after writer close=3D40Mb >>>> 2) initial=3D41Mb, before end of optimize=3D122Mb, after end of >>>> optimize=3D104Mb, >>>> after commit=3D104Mb, after reader close=3D104Mb, after writer close= =3D40Mb >>>> 3) initial=3D41Mb, before end of optimize=3D145Mb, after end of >>>> optimize=3D127Mb, >>>> after commit=3D103Mb, after reader close=3D103Mb, after writer close= =3D40Mb >>>> >>>> From your different posts I assumed that a commit would have the same >>>> effect >>>> as a close as far as reclaiming disk space is concerned. however test >>>> cases >>>> 2 and 3 show that whether the reader is opened before or during the >>>> optimize >>>> we end up after commit with an index that is 2.5 times the nominal siz= e. >>>> closing the reader does not change anything. only a close can get us t= he >>>> index back to nominal. >>>> >>>> What is the reason why the commit nor closing the reader can get us ba= ck >>>> to >>>> nominal? >>>> Do you recommend closing and recreating a new writer after an optimize= ? >>>> >>>> thanks >>>> vincent >>>> >>>> >>>> Michael McCandless-2 wrote: >>>>> >>>>> OK, I'll add that to the javadocs; thanks. >>>>> >>>>> But the fact that you weren't closing the old readers was probably >>>>> also tying up lots of disk space... >>>>> >>>>> Mike >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.h= tml >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> >> >> -- >> View this message in context: http://old.nabble.com/Searching-while-opti= mizing-tp26485138p26556468.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org