incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kurz <>
Subject Re: fsync
Date Wed, 23 Dec 2009 05:31:02 GMT
On Sun, Dec 20, 2009 at 6:15 AM, Michael McCandless
<> wrote:
> On Sun, Dec 20, 2009 at 12:14 AM, Marvin Humphrey
> <> wrote:
>>> I also think that Mike is making too much distinction between
>>> "relying on the file system" and "using shared memory".  I think
>>> one can safely view them as two interfaces to the same underlying
>>> mechanism.
> Using the filesystem for sharing vs using shared memory seem quite
> different to me.  EG one could create a rich data structure (say an
> FST) to represent the terms dict in RAM, then share that terms dict
> amongst many processes, right?
> Whereas, using the filesystem really requires a file-flat data
> structure?

I guess it depends on your point of view:  it would be hard (but not
impossible) to do true objects in an mmapped file, but it would be
very easy to do has-a type relationships using file offsets as
pointers.  I tend to have a data-centric (rather than object-centric)
point of view, but from here I don't see any data structures that
would be significantly more difficult.

Do you have a link that explains the FST you refer to?  I'm searching,
and not finding anything that's a definite match.  "Field select

> Ie, "going through the filesystem" and "going through shared memory"
> are two alternatives for enabling efficient process-only concurrency
> models.  They have interesting tradeoffs (I'll answer more in 2026),
> but the fact that one of them is backed by a file by the OS seems like
> a salient difference.

For me, file backing doesn't seem like a big difference.   Fast moving
changes will never hit disk, and I presume there is some way you can
convince the system never to actually write out the slow changes
(maybe mmap on a RamFS?).  I think the real difference is between
sharing between threads and sharing between processes --- basically,
whether or not you can assume that the address space is identical in
all the 'sharees'.

I'll mention that, given the New Year, at first I thought 2026 was
your realistic time estimate rather than a tracking number.


I started thinking about how one could do objects with mmap, and came
up with an approach that doesn't quite answer that question but might
actually work out well for other problems:  you could literally
compile your index and link it in as a shared library.
Each term would be a symbol, and you'd use 'dlsym' to find the associated data.

It's possible that you could even use library versioning to handle
updates, and stuff like RTLD_NEXT to handle multiple segments. Perhaps
a really bad idea, but I find it an intriguing one.   I wonder how
fast using libdl would be compared to writing your own lookup tables.
I'd have to guess it's fairly efficient.

Nathan Kurz

View raw message