From Michael McCandless <>
Subject Re: fsync
Date Sun, 20 Dec 2009 14:15:41 GMT
On Sun, Dec 20, 2009 at 12:14 AM, Marvin Humphrey
<> wrote:

>> I also think that Mike is making too much distinction between
>> "relying on the file system" and "using shared memory".  I think
>> one can safely view them as two interfaces to the same underlying
>> mechanism.
> I agree with that, and it was kind of confusing since Mike had
> previously seemed to suggest that the flush() semantics were a
> "Lucy-ification" of the Lucene model.

I still need to answer on 2026, but this caught my eye first ;)

Using the filesystem for sharing vs using shared memory seem quite
different to me.  EG one could create a rich data structure (say an
FST) to represent the terms dict in RAM, then share that terms dict
amongst many processes, right?

Whereas, using the filesystem really requires a file-flat data

Ie, "going through the filesystem" and "going through shared memory"
are two alternatives for enabling efficient process-only concurrency
models.  They have interesting tradeoffs (I'll answer more in 2026),
but the fact that one of them is backed by a file by the OS seems like
a salient difference.

Net/net, I think the proposed Lucy flush v commit semantics is a good
approach, given Lucy's design constraints.  Just like Lucene's NRT,
Lucy users won't be forced to tradeoff reopen time for durability.


