lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Searching while optimizing
Date Sun, 29 Nov 2009 10:57:09 GMT
OK I dug down on this one... it's actually a bug in IndexWriter, when
used in near real-time mode *and* when CFS is enabled.  In that case,
internally IndexWriter holds open the wrong SegmentReader, thus tying
up more disk space than it should.

Functionally, the bug is harmless -- it's just tying up disk space.

I've boiled your example down to a test case.

Thanks for catching & reporting this! I'll open an issue.

If it's a problem, you can workaround the bug by either turning off
CFS, or, using IndexReader.open (& reopen) to get your reader, instead
of the near real-time writer. getReader() method.

Mike

On Sat, Nov 28, 2009 at 3:02 PM, vsevel <v.sevel@lombardodier.com> wrote:
>
> Hi, thanks for the explanations. Though I had no luck...
>
> I now do the close of the reader before the commit. But still, only the
> close get us back to nominal. Here is the complete test:
>
>    @Test
>    public void optimize() throws Exception {
>        final File dir = new File("lucene_work/optimize");
>        dir.mkdirs();
>
>        for (File f : dir.listFiles()) {
>            f.delete();
>        }
>
>        Assert.assertEquals(0, dir.listFiles().length);
>
>        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
>        MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;
>        IndexWriter writer = new IndexWriter(FSDirectory.open(dir),
> analyzer, true, maxLength);
>        monitorIndexSize(dir);
>        long time = 2000;
>
>        log.info("writing...");
>        for (int i = 0; i < 1000000; i++) {
>            Document doc = new Document();
>            doc.add(new Field("foo", "bar " + i, Store.YES,
> Index.NOT_ANALYZED));
>            writer.addDocument(doc);
>        }
>
>        writer.commit();
>        log.info("done write");
>        Thread.sleep(time);
>
>        log.info("opening reader...");
>        IndexReader reader = writer.getReader();
>        log.info("done open reader");
>        Thread.sleep(time);
>
>        log.info("optimizing...");
>        writer.optimize();
>        log.info("done optimize");
>        Thread.sleep(time);
>
>        log.info("closing reader...");
>        reader.close();
>        log.info("done reader close");
>        Thread.sleep(time);
>
>        log.info("committing...");
>        writer.commit();
>        log.info("done commit");
>        Thread.sleep(time);
>
>        log.info("closing writer...");
>        writer.close();
>        log.info("done writer close");
>        Thread.sleep(time);
>    }
>
> And an exec log:
>
> 15:58:46,875  INFO logserver.LuceneSystemTest     writing...
> 15:58:46,875  INFO logserver.LuceneSystemTest     size=0Mb
> 15:58:47,891  INFO logserver.LuceneSystemTest     size=1Mb
> 15:58:48,891  INFO logserver.LuceneSystemTest     size=3Mb
> 15:58:49,891  INFO logserver.LuceneSystemTest     size=5Mb
> 15:58:50,906  INFO logserver.LuceneSystemTest     size=8Mb
> 15:58:51,906  INFO logserver.LuceneSystemTest     size=9Mb
> 15:58:52,906  INFO logserver.LuceneSystemTest     size=12Mb
> 15:58:53,922  INFO logserver.LuceneSystemTest     size=14Mb
> 15:58:54,984  INFO logserver.LuceneSystemTest     size=15Mb
> 15:58:55,984  INFO logserver.LuceneSystemTest     size=18Mb
> 15:58:56,984  INFO logserver.LuceneSystemTest     size=20Mb
> 15:58:58,000  INFO logserver.LuceneSystemTest     size=21Mb
> 15:58:59,000  INFO logserver.LuceneSystemTest     size=25Mb
> 15:59:00,016  INFO logserver.LuceneSystemTest     size=27Mb
> 15:59:01,016  INFO logserver.LuceneSystemTest     size=29Mb
> 15:59:02,016  INFO logserver.LuceneSystemTest     size=52Mb
> 15:59:03,031  INFO logserver.LuceneSystemTest     size=52Mb
> 15:59:04,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:04,328  INFO logserver.LuceneSystemTest     done write
> 15:59:05,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:06,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:06,328  INFO logserver.LuceneSystemTest     opening reader...
> 15:59:06,453  INFO logserver.LuceneSystemTest     done open reader
> 15:59:07,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:08,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:08,453  INFO logserver.LuceneSystemTest     optimizing...
> 15:59:09,047  INFO logserver.LuceneSystemTest     size=34Mb
> 15:59:10,047  INFO logserver.LuceneSystemTest     size=37Mb
> 15:59:11,047  INFO logserver.LuceneSystemTest     size=40Mb
> 15:59:12,047  INFO logserver.LuceneSystemTest     size=42Mb
> 15:59:12,391  INFO logserver.LuceneSystemTest     done optimize
> 15:59:13,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:14,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:14,391  INFO logserver.LuceneSystemTest     closing reader...
> 15:59:14,406  INFO logserver.LuceneSystemTest     done reader close
> 15:59:15,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:16,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:16,406  INFO logserver.LuceneSystemTest     committing...
> 15:59:16,469  INFO logserver.LuceneSystemTest     done commit
> 15:59:17,062  INFO logserver.LuceneSystemTest     size=43Mb
> 15:59:18,062  INFO logserver.LuceneSystemTest     size=43Mb
> 15:59:18,469  INFO logserver.LuceneSystemTest     closing writer...
> 15:59:18,484  INFO logserver.LuceneSystemTest     done writer close
> 15:59:19,062  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:20,078  INFO logserver.LuceneSystemTest     size=32Mb
>
> I guess I would be able to do a close and reopen if really I need to. But if
> there is a nicer and more natural solution, I would love to know about it.
>
> thanks,
> vincent
>
>
> Michael McCandless-2 wrote:
>>
>> Phew, thanks for testing!  It's all explainable...
>>
>> When you have a reader open, it prevents the segments it had opened
>> from being deleted.
>>
>> When you close that reader, the segments could be deleted, however,
>> that won't happen until the writer next tries to delete, which it does
>> only periodically (eg, on flushing a new segment, committing a new
>> merge, etc.).
>>
>> Could you try closing your reader, then calling writer.commit() (which
>> is a no-op, since you had already committed, but it may tickle the
>> writer into attempting the deletions), and see if that frees up disk
>> space w/o closing?
>>
>> Mike
>>
>> On Fri, Nov 27, 2009 at 4:12 PM, vsevel <v.sevel@lombardodier.com> wrote:
>>> I am starting my tests with an unoptimized 40Mb index. I have 3 test
>>> cases:
>>> 1) open a writer, optimize, commit, close
>>> 2) open a writer, open a reader from the writer, optimize, commit, close
>>> 3) same as 2) except the reader is opened while the optimize is done in a
>>> different thread
>>>
>>> During all the tests, I monitor the size of the index on the disk. The
>>> results are:
>>> 1) initial=41Mb, before end of optimize=122Mb, after end of
>>> optimize=81Mb,
>>> after commit=40Mb,                            after writer close=40Mb
>>> 2) initial=41Mb, before end of optimize=122Mb, after end of
>>> optimize=104Mb,
>>> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
>>> 3) initial=41Mb, before end of optimize=145Mb, after end of
>>> optimize=127Mb,
>>> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>>>
>>> From your different posts I assumed that a commit would have the same
>>> effect
>>> as a close as far as reclaiming disk space is concerned. however test
>>> cases
>>> 2 and 3 show that whether the reader is opened before or during the
>>> optimize
>>> we end up after commit with an index that is 2.5 times the nominal size.
>>> closing the reader does not change anything. only a close can get us the
>>> index back to nominal.
>>>
>>> What is the reason why the commit nor closing the reader can get us back
>>> to
>>> nominal?
>>> Do you recommend closing and recreating a new writer after an optimize?
>>>
>>> thanks
>>> vincent
>>>
>>>
>>> Michael McCandless-2 wrote:
>>>>
>>>> OK, I'll add that to the javadocs; thanks.
>>>>
>>>> But the fact that you weren't closing the old readers was probably
>>>> also tying up lots of disk space...
>>>>
>>>> Mike
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26556468.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message