lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: Getting fsync out of the loop
Date Wed, 07 Apr 2010 19:00:04 GMT
I don't have the system at hand now, but if I remember right fsync
took like 100-200ms.

2010/4/7 Shai Erera <serera@gmail.com>:
> Earwin - do you have some numbers to share on the running time of the
> indexing application? You've mentioned that if you take out fsync into a BG
> thread, the running time improves, but I'm curious to know by how much.
>
> Shai
>
> On Wed, Apr 7, 2010 at 2:26 AM, Earwin Burrfoot <earwin@gmail.com> wrote:
>>
>> > Running out of disk space with fsync disabled won't lead to corruption.
>> > Even kill -9 the JRE process with fsync disabled won't corrupt.
>> > In these cases index just falls back to last successful commit.
>> >
>> > It's "only" power loss / OS / machine crash where you need fsync to
>> > avoid possible corruption (corruption may not even occur w/o fsync if
>> > you "get lucky").
>>
>> Sorry to disappoint you, but running out of disk space is worse than kill
>> -9.
>> You can write down the file (to cache in fact), close it, all without
>> getting any
>> exceptions. And then it won't get flushed to disk because the disk is
>> full.
>> This can happen to segments file (and old one is deleted with default
>> deletion
>> policy). This can happen to fat freq/prox files mentioned in segments file
>> (and yeah, the old segments file is deleted, so no falling back).
>>
>> > What if your background thread simply committed every couple of minutes?
>> > What's the difference between taking the snapshot (which means you had
>> > to call commit previously) and commit it, to call iw.commit by a
>> > backgroud merge?
>> --
>> > But: why do you need to commit so often?
>> To see stuff on reopen? Yes, I know about NRT.
>>
>> > You've reinvented autocommit=true!
>> ?? I'm doing regular commits, syncing down every Nth of it.
>>
>> > Doesn't this just BG the syncing?  Ie you could make a dedicated
>> > thread to do this.
>> Yes, exactly, this BGs the syncing to a dedicated thread. Threads
>> doing indexation/merging can continue unhampered.
>>
>> > One possible win with this aproach is.... the cost of fsync should go
>> > way down the longer you wait after writing bytes to the file and
>> > before calling fsync.  This is because typically OS write caches
>> > expire by time (eg 30 seconds) so if you want long enough the bytes
>> > will already at least be delivered to the IO system (but the IO system
>> > can do further caching which could still take time).  On windows at
>> > least I definitely noticed this effect -- wait some before fync'ing
>> > and it's net/net much less costly.
>> Yup. In fact you can just hold on to the latest commit for N seconds,
>> than switch to the new latest commit.
>> OS will fsync everything for you.
>>
>>
>> I'm just playing around with stupid idea. I'd like to have NRT
>> look-alike without binding readers and writers. :)
>> Right now it's probably best for me to save my time and cut over to
>> current NRT.
>> But. An important lesson was learnt - no fsyncing blows up your index
>> on out-of-disk-space.
>>
>> --
>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>> ICQ: 104465785
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message