incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Optimizing InStream for mmap
Date Mon, 17 Nov 2008 17:56:59 GMT
On Sun, Nov 16, 2008 at 08:38:09PM -0600, Peter Karman wrote:
> How big an issue is 32-bit support? 

I don't think it's that big a deal.

There are 4 main index reading components in core.  DocReader and
TermVectorsReader are very similar to each other, and neither would benefit
significantly from refactoring for mmap, either in terms of simplicity or
performance.  Each uses two files per segment: a data stream and an index
stream.  There's no point in mapping the data stream.  Since the index stream
is just a stack of 64-bit integers, we could map it and access it as an array
of i64_t, but I can't imagine we'd see a measurable performance gain.  Even if
we did, it would only be on 32-bit systems, because on 64-bit systems the
InStream will be mapping the whole file anyway internally and the overhead of
accessing the mapped file as a stream (using Seek, Read_U64, etc) wouldn't be

That leaves the lexicons and the posting lists.  Right now, supporting
divergent code for those components doesn't look too bad, but maybe that will
change if we can start dreaming up ways to exploit the mapped files.

I'd really like to have zero cache loads on Searcher startup, which means
changing how the Lexicon indexes work.

Even better would be a SortCacheWriter component that eliminates the
substantial cost of warming sort caches by writing a mappable file at index
time.  Loading the sort cache would then be as simple as mapping the file, and
warming it for ALL forks would be as simple as "cat /path/to/index/* >
/dev/null".  However, I don't presently envision such a component as belonging
to core.

> Can you even buy a 32-bit box anymore?  I expect the ones in existence will
> be around for a few more years, but when someone like Apple drops support
> for them in their OS, you know the end is nigh.

I think it's too early to drop support for 32-bit.  And if we did, IMO we'd
have to fail at compile-time -- it's not acceptable to have a search app work
for a while and then suddenly blow up when the index reaches a threshold size.

Marvin Humphrey

View raw message