lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Searching while optimizing
Date Sun, 29 Nov 2009 11:09:57 GMT
OK I opened https://issues.apache.org/jira/browse/LUCENE-2097 to track this.

Thanks v.sevel!

Mike

On Sun, Nov 29, 2009 at 5:57 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> OK I dug down on this one... it's actually a bug in IndexWriter, when
> used in near real-time mode *and* when CFS is enabled.  In that case,
> internally IndexWriter holds open the wrong SegmentReader, thus tying
> up more disk space than it should.
>
> Functionally, the bug is harmless -- it's just tying up disk space.
>
> I've boiled your example down to a test case.
>
> Thanks for catching & reporting this! I'll open an issue.
>
> If it's a problem, you can workaround the bug by either turning off
> CFS, or, using IndexReader.open (& reopen) to get your reader, instead
> of the near real-time writer. getReader() method.
>
> Mike
>
> On Sat, Nov 28, 2009 at 3:02 PM, vsevel <v.sevel@lombardodier.com> wrote:
>>
>> Hi, thanks for the explanations. Though I had no luck...
>>
>> I now do the close of the reader before the commit. But still, only the
>> close get us back to nominal. Here is the complete test:
>>
>>    @Test
>>    public void optimize() throws Exception {
>>        final File dir = new File("lucene_work/optimize");
>>        dir.mkdirs();
>>
>>        for (File f : dir.listFiles()) {
>>            f.delete();
>>        }
>>
>>        Assert.assertEquals(0, dir.listFiles().length);
>>
>>        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
>>        MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;
>>        IndexWriter writer = new IndexWriter(FSDirectory.open(dir),
>> analyzer, true, maxLength);
>>        monitorIndexSize(dir);
>>        long time = 2000;
>>
>>        log.info("writing...");
>>        for (int i = 0; i < 1000000; i++) {
>>            Document doc = new Document();
>>            doc.add(new Field("foo", "bar " + i, Store.YES,
>> Index.NOT_ANALYZED));
>>            writer.addDocument(doc);
>>        }
>>
>>        writer.commit();
>>        log.info("done write");
>>        Thread.sleep(time);
>>
>>        log.info("opening reader...");
>>        IndexReader reader = writer.getReader();
>>        log.info("done open reader");
>>        Thread.sleep(time);
>>
>>        log.info("optimizing...");
>>        writer.optimize();
>>        log.info("done optimize");
>>        Thread.sleep(time);
>>
>>        log.info("closing reader...");
>>        reader.close();
>>        log.info("done reader close");
>>        Thread.sleep(time);
>>
>>        log.info("committing...");
>>        writer.commit();
>>        log.info("done commit");
>>        Thread.sleep(time);
>>
>>        log.info("closing writer...");
>>        writer.close();
>>        log.info("done writer close");
>>        Thread.sleep(time);
>>    }
>>
>> And an exec log:
>>
>> 15:58:46,875  INFO logserver.LuceneSystemTest     writing...
>> 15:58:46,875  INFO logserver.LuceneSystemTest     size=0Mb
>> 15:58:47,891  INFO logserver.LuceneSystemTest     size=1Mb
>> 15:58:48,891  INFO logserver.LuceneSystemTest     size=3Mb
>> 15:58:49,891  INFO logserver.LuceneSystemTest     size=5Mb
>> 15:58:50,906  INFO logserver.LuceneSystemTest     size=8Mb
>> 15:58:51,906  INFO logserver.LuceneSystemTest     size=9Mb
>> 15:58:52,906  INFO logserver.LuceneSystemTest     size=12Mb
>> 15:58:53,922  INFO logserver.LuceneSystemTest     size=14Mb
>> 15:58:54,984  INFO logserver.LuceneSystemTest     size=15Mb
>> 15:58:55,984  INFO logserver.LuceneSystemTest     size=18Mb
>> 15:58:56,984  INFO logserver.LuceneSystemTest     size=20Mb
>> 15:58:58,000  INFO logserver.LuceneSystemTest     size=21Mb
>> 15:58:59,000  INFO logserver.LuceneSystemTest     size=25Mb
>> 15:59:00,016  INFO logserver.LuceneSystemTest     size=27Mb
>> 15:59:01,016  INFO logserver.LuceneSystemTest     size=29Mb
>> 15:59:02,016  INFO logserver.LuceneSystemTest     size=52Mb
>> 15:59:03,031  INFO logserver.LuceneSystemTest     size=52Mb
>> 15:59:04,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:04,328  INFO logserver.LuceneSystemTest     done write
>> 15:59:05,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:06,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:06,328  INFO logserver.LuceneSystemTest     opening reader...
>> 15:59:06,453  INFO logserver.LuceneSystemTest     done open reader
>> 15:59:07,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:08,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:08,453  INFO logserver.LuceneSystemTest     optimizing...
>> 15:59:09,047  INFO logserver.LuceneSystemTest     size=34Mb
>> 15:59:10,047  INFO logserver.LuceneSystemTest     size=37Mb
>> 15:59:11,047  INFO logserver.LuceneSystemTest     size=40Mb
>> 15:59:12,047  INFO logserver.LuceneSystemTest     size=42Mb
>> 15:59:12,391  INFO logserver.LuceneSystemTest     done optimize
>> 15:59:13,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:14,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:14,391  INFO logserver.LuceneSystemTest     closing reader...
>> 15:59:14,406  INFO logserver.LuceneSystemTest     done reader close
>> 15:59:15,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:16,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:16,406  INFO logserver.LuceneSystemTest     committing...
>> 15:59:16,469  INFO logserver.LuceneSystemTest     done commit
>> 15:59:17,062  INFO logserver.LuceneSystemTest     size=43Mb
>> 15:59:18,062  INFO logserver.LuceneSystemTest     size=43Mb
>> 15:59:18,469  INFO logserver.LuceneSystemTest     closing writer...
>> 15:59:18,484  INFO logserver.LuceneSystemTest     done writer close
>> 15:59:19,062  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:20,078  INFO logserver.LuceneSystemTest     size=32Mb
>>
>> I guess I would be able to do a close and reopen if really I need to. But if
>> there is a nicer and more natural solution, I would love to know about it.
>>
>> thanks,
>> vincent
>>
>>
>> Michael McCandless-2 wrote:
>>>
>>> Phew, thanks for testing!  It's all explainable...
>>>
>>> When you have a reader open, it prevents the segments it had opened
>>> from being deleted.
>>>
>>> When you close that reader, the segments could be deleted, however,
>>> that won't happen until the writer next tries to delete, which it does
>>> only periodically (eg, on flushing a new segment, committing a new
>>> merge, etc.).
>>>
>>> Could you try closing your reader, then calling writer.commit() (which
>>> is a no-op, since you had already committed, but it may tickle the
>>> writer into attempting the deletions), and see if that frees up disk
>>> space w/o closing?
>>>
>>> Mike
>>>
>>> On Fri, Nov 27, 2009 at 4:12 PM, vsevel <v.sevel@lombardodier.com> wrote:
>>>> I am starting my tests with an unoptimized 40Mb index. I have 3 test
>>>> cases:
>>>> 1) open a writer, optimize, commit, close
>>>> 2) open a writer, open a reader from the writer, optimize, commit, close
>>>> 3) same as 2) except the reader is opened while the optimize is done in a
>>>> different thread
>>>>
>>>> During all the tests, I monitor the size of the index on the disk. The
>>>> results are:
>>>> 1) initial=41Mb, before end of optimize=122Mb, after end of
>>>> optimize=81Mb,
>>>> after commit=40Mb,                            after writer
close=40Mb
>>>> 2) initial=41Mb, before end of optimize=122Mb, after end of
>>>> optimize=104Mb,
>>>> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
>>>> 3) initial=41Mb, before end of optimize=145Mb, after end of
>>>> optimize=127Mb,
>>>> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>>>>
>>>> From your different posts I assumed that a commit would have the same
>>>> effect
>>>> as a close as far as reclaiming disk space is concerned. however test
>>>> cases
>>>> 2 and 3 show that whether the reader is opened before or during the
>>>> optimize
>>>> we end up after commit with an index that is 2.5 times the nominal size.
>>>> closing the reader does not change anything. only a close can get us the
>>>> index back to nominal.
>>>>
>>>> What is the reason why the commit nor closing the reader can get us back
>>>> to
>>>> nominal?
>>>> Do you recommend closing and recreating a new writer after an optimize?
>>>>
>>>> thanks
>>>> vincent
>>>>
>>>>
>>>> Michael McCandless-2 wrote:
>>>>>
>>>>> OK, I'll add that to the javadocs; thanks.
>>>>>
>>>>> But the fact that you weren't closing the old readers was probably
>>>>> also tying up lots of disk space...
>>>>>
>>>>> Mike
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26556468.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message