lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Getting fsync out of the loop
Date Wed, 07 Apr 2010 03:51:43 GMT
Earwin - do you have some numbers to share on the running time of the
indexing application? You've mentioned that if you take out fsync into a BG
thread, the running time improves, but I'm curious to know by how much.

Shai

On Wed, Apr 7, 2010 at 2:26 AM, Earwin Burrfoot <earwin@gmail.com> wrote:

> > Running out of disk space with fsync disabled won't lead to corruption.
> > Even kill -9 the JRE process with fsync disabled won't corrupt.
> > In these cases index just falls back to last successful commit.
> >
> > It's "only" power loss / OS / machine crash where you need fsync to
> > avoid possible corruption (corruption may not even occur w/o fsync if
> > you "get lucky").
>
> Sorry to disappoint you, but running out of disk space is worse than kill
> -9.
> You can write down the file (to cache in fact), close it, all without
> getting any
> exceptions. And then it won't get flushed to disk because the disk is full.
> This can happen to segments file (and old one is deleted with default
> deletion
> policy). This can happen to fat freq/prox files mentioned in segments file
> (and yeah, the old segments file is deleted, so no falling back).
>
> > What if your background thread simply committed every couple of minutes?
> > What's the difference between taking the snapshot (which means you had
> > to call commit previously) and commit it, to call iw.commit by a
> backgroud merge?
> --
> > But: why do you need to commit so often?
> To see stuff on reopen? Yes, I know about NRT.
>
> > You've reinvented autocommit=true!
> ?? I'm doing regular commits, syncing down every Nth of it.
>
> > Doesn't this just BG the syncing?  Ie you could make a dedicated
> > thread to do this.
> Yes, exactly, this BGs the syncing to a dedicated thread. Threads
> doing indexation/merging can continue unhampered.
>
> > One possible win with this aproach is.... the cost of fsync should go
> > way down the longer you wait after writing bytes to the file and
> > before calling fsync.  This is because typically OS write caches
> > expire by time (eg 30 seconds) so if you want long enough the bytes
> > will already at least be delivered to the IO system (but the IO system
> > can do further caching which could still take time).  On windows at
> > least I definitely noticed this effect -- wait some before fync'ing
> > and it's net/net much less costly.
> Yup. In fact you can just hold on to the latest commit for N seconds,
> than switch to the new latest commit.
> OS will fsync everything for you.
>
>
> I'm just playing around with stupid idea. I'd like to have NRT
> look-alike without binding readers and writers. :)
> Right now it's probably best for me to save my time and cut over to current
> NRT.
> But. An important lesson was learnt - no fsyncing blows up your index
> on out-of-disk-space.
>
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message