lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: Getting fsync out of the loop
Date Tue, 06 Apr 2010 23:26:51 GMT
> Running out of disk space with fsync disabled won't lead to corruption.
> Even kill -9 the JRE process with fsync disabled won't corrupt.
> In these cases index just falls back to last successful commit.
>
> It's "only" power loss / OS / machine crash where you need fsync to
> avoid possible corruption (corruption may not even occur w/o fsync if
> you "get lucky").

Sorry to disappoint you, but running out of disk space is worse than kill -9.
You can write down the file (to cache in fact), close it, all without
getting any
exceptions. And then it won't get flushed to disk because the disk is full.
This can happen to segments file (and old one is deleted with default deletion
policy). This can happen to fat freq/prox files mentioned in segments file
(and yeah, the old segments file is deleted, so no falling back).

> What if your background thread simply committed every couple of minutes?
> What's the difference between taking the snapshot (which means you had
> to call commit previously) and commit it, to call iw.commit by a backgroud merge?
--
> But: why do you need to commit so often?
To see stuff on reopen? Yes, I know about NRT.

> You've reinvented autocommit=true!
?? I'm doing regular commits, syncing down every Nth of it.

> Doesn't this just BG the syncing?  Ie you could make a dedicated
> thread to do this.
Yes, exactly, this BGs the syncing to a dedicated thread. Threads
doing indexation/merging can continue unhampered.

> One possible win with this aproach is.... the cost of fsync should go
> way down the longer you wait after writing bytes to the file and
> before calling fsync.  This is because typically OS write caches
> expire by time (eg 30 seconds) so if you want long enough the bytes
> will already at least be delivered to the IO system (but the IO system
> can do further caching which could still take time).  On windows at
> least I definitely noticed this effect -- wait some before fync'ing
> and it's net/net much less costly.
Yup. In fact you can just hold on to the latest commit for N seconds,
than switch to the new latest commit.
OS will fsync everything for you.


I'm just playing around with stupid idea. I'd like to have NRT
look-alike without binding readers and writers. :)
Right now it's probably best for me to save my time and cut over to current NRT.
But. An important lesson was learnt - no fsyncing blows up your index
on out-of-disk-space.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message