incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Optimizing InStream for mmap
Date Sat, 15 Nov 2008 23:38:34 GMT

In commits r3895 - r3925 to the KinoSearch repository, InStream has been
optimized for internal use of mmap() on Unixen.  

  * On 32-bit Unixen, InStream provides access to the file data via a variable
    width "sliding window". The window is opened and closed using continuous
    calls to mmap() and munmap().
  * On systems without sys/mman.h (e.g. Windows), we fall back to using
    a malloc'd buffer and sequential reads to fake up a sliding window.
  * On 64-bit Unixen, mmap() only gets called once, at object creation time.
    There's no need for a sliding window.

For optimum performance under 64-bit Unixen, client code can request a
window the width of the entire file:

  Foo_new(InStream *instream)
    Foo   *self    = (Foo*)CREATE(NULL, FOO);
    i64_t  len     = InStream_Length(instream);
    self->buf      = InStream_Buf(instream, len); /* map whole file */
    self->limit    = buf + len;
    self->instream = REFCOUNT_INC(instream);
    return self;

Such code would work fine for small files on 32-bit systems.  Large files,
however, would cause such systems to blow up, either by exceeding addressable
space and causing mmap() to fail, or, for systems without mmap(), through
excessive memory consumption.

To be portable to 32-bit systems, core modules will have to avoid mapping
large files.  If we want to max out the performance of PostingLists and
Lexicons on 64-bit systems, that means we'll have to accept the increased
maintenance burden of providing two different behaviors.  I don't think the
burden will be too heavy, though.  

Marvin Humphrey

View raw message