Return-Path: Delivered-To: apmail-lucene-lucy-dev-archive@locus.apache.org Received: (qmail 67329 invoked from network); 26 Sep 2008 07:30:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Sep 2008 07:30:45 -0000 Received: (qmail 16652 invoked by uid 500); 26 Sep 2008 07:30:43 -0000 Delivered-To: apmail-lucene-lucy-dev-archive@lucene.apache.org Received: (qmail 16633 invoked by uid 500); 26 Sep 2008 07:30:43 -0000 Mailing-List: contact lucy-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@lucene.apache.org Delivered-To: mailing list lucy-dev@lucene.apache.org Received: (qmail 16621 invoked by uid 99); 26 Sep 2008 07:30:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Sep 2008 00:30:42 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.198.224] (HELO rv-out-0506.google.com) (209.85.198.224) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Sep 2008 07:29:43 +0000 Received: by rv-out-0506.google.com with SMTP id f6so776397rvb.5 for ; Fri, 26 Sep 2008 00:30:16 -0700 (PDT) Received: by 10.114.103.1 with SMTP id a1mr939323wac.82.1222414215913; Fri, 26 Sep 2008 00:30:15 -0700 (PDT) Received: by 10.114.174.10 with HTTP; Fri, 26 Sep 2008 00:30:15 -0700 (PDT) Message-ID: <65d3176c0809260030t4e600cf5g43d5fef6e6637029@mail.gmail.com> Date: Fri, 26 Sep 2008 01:30:15 -0600 From: "Nathan Kurz" To: lucy-dev@lucene.apache.org Subject: Re: Posting codecs In-Reply-To: <65d3176c0809212335l64baebfw5abc259fbc417f8d@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <581D4B54-2ABA-4DEA-AC5A-78417E3348A9@rectangular.com> <65d3176c0809142202q39d94a83o985df3c95a7867a5@mail.gmail.com> <65d3176c0809171316s31c65048ue6afcb540d845694@mail.gmail.com> <7AFF05A2-FE1E-4A35-B3AE-E111DD10F068@rectangular.com> <65d3176c0809191125g3c7fd1fav88a4e7c86f9b41ec@mail.gmail.com> <65d3176c0809212335l64baebfw5abc259fbc417f8d@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org I started looking at the code a bit, trying to figure out a more incremental way of approaching this. I now think there might be room for one, although as always my urge to boil oceans is hard to suppress. The first step would be to switch InStream to using mmap() instead of its user level buffering. This looks surprisingly easy, presuming we are allowed to ignore Windows' compatibility, or at least Windows' performance. The second step would then be to expose InStream->buf, probably via an accessor so that EOF can be reported. I said I wasn't intending to expose the raw buffer, but once we are using mmap we shouldn't ever need to call refill(). There would need to be some implied rules, such as never go beyond the current Posting. The third step is to change ScorePost->read_record to decode the VBytes directly from the raw buffer, or perhaps better, to create a FastScorePost subclass that does this. InStream->seek() is called after the buffer is read to update Instream->buf and assert() any reads that go to far. The nice part about this approach is that we should get a nice speed-up for the common use cases while requiring very few API changes. Old scorers et al can still continue as they did before, and new ones can change over gradually. The downside is that each Scorer remains tied to particular index format. Long-term I still think this is disastrous, but in the short term it's not that bad. But perhaps we can delay solving this problem until you get P4Delta ready to go. Sound plausible? Nathan Kurz nate@verse.com On Mon, Sep 22, 2008 at 12:35 AM, Nathan Kurz wrote: > On Fri, Sep 19, 2008 at 8:51 PM, Marvin Humphrey wrote: >> Let's set a goal of implementing PForDelta, and figure out how to get there >> from here. > > Sure, that seems like a fine goal. I'm not sure if you meant it this > way, but I think it would be great to add support for PForDelta to the > existing VByte support rather than just replacing it. While PForDelta > might fully replace VByte eventually, it would be good to design an > architecture that allows for multiple file formats, so that tricks > like using Lucene index files directly are theoretically possible, and > so that the path for creating custom file formats is better tested. > >>> You need to figure out some to decode the entire posting with fewer >>> function calls, and this is where the bulk of them are coming from. >>> I'd suggest having the Store level return an undecoded raw posting, >>> and let the Posting class decode the parts it wants. That way the >>> VByte code can be macros that work on a local buffer in a tight loop. >> >> It's a little tricky to give anything other than the InStream object itself >> direct access to the raw buffer. It's inevitable that the bytes that form a >> compressed integer will straddle a buffer boundary from time to time. > > OK. I wasn't hoping to give access to the raw buffer, but to a cooked > one guaranteed to contain the entire posting. But I'd forgotten that > the size of the compressed posting isn't known in advance! The > important part, though, is that the entire posting be decompressed in > a tight loop with minimal function call overhead. So instead of my > first suggestion, can we have the Store return a fully decoded > Posting? > > More fully fleshed out (up to partially-informed), I'd like for a > system that looks something like this: > > 1) Scorer->init creates a template Posting which is passed to the > Store for approval. Store initializes itself and answers > affirmatively if it can handle the particular flavor of Posting > proposed. For example, Store->init_posting(PositionPosting) might be > refused by a Store than only handles MatchPostings due to index > limitations. Alternatively, a Store managing an index that has > position information available might re-wire itself to use a faster > method to fill a MatchPosting than a full PositionPosting. > > 2) Scoring loop starts and Scorer calls > Store->fill_posting(PositionPosting, min_doc_number) . This is > analogous to Scorer->SkipTo(), but decodes the entire posting from the > index in a single pass. The Posting has nominal control over its own > fields (resizes its own position buffer, for example) but the Store is > allowed to write data directly to the struct and the Scorer reads > fields directly from them. > > My goals here are to have a system that dramatically speeds up the the > reads from the index, makes it possible for existing Scorers to work > with new index formats, and allows for easy creation of custom Scorers > for use with custom indexes. My guess is that a move to this system > would give an immediate 2x speedup, while making it easier to add > custom features and future optimizations. I think the main downside > is that it violates most people's conception of a standard Object > Oriented Model. Personally, my main doubts are about the > initialization/negotiation phase --- perhaps there is some better way > to do this? > > And I apologize for my ad hoc terminology. It's partly that I'm not > familiar with the proper Kinosearch terminology, and partly that the > levels that are there don't quite map up with my mental model. What I > want is for the Posting to be an intermediary between the Scorer and > the Store, where the Posting has a known data layout and only the > Store knows about the index internals, be it using VByte or PForDelta. > I'd be happy to call the Store an Index, apart from the confusion > between the file and the class, but I'm not comfortable viewing it as > a Stream. Is there a more standard way to express this? > >>> The last thing (sort of a non-thing) is that to my surprise the >>> double/triple buffering of the Posting data doesn't seem to have much >>> of a negative effect. I still think it's worth trying to avoid this, >>> but it's barely on the current radar. >> >> The double/triple buffering may not cause a slowdown, but it's worth going >> with mmap for the backing buffer on InStream simply for memory footprint. >> Since all index files are read-only once committed, they can be shared. >> That means if you have multiple processes with multiple IndexReaders all >> reading the same file, they can all effectively use the same backing buffer, >> and we'll let the OS manage page swapping. > > I agree that there a multiple other reasons for using mmap here. I > definitely want a design that makes it easy to drop in mmap versions > of the index readers for all the reasons that you specify. I'm also > hoping that we can make the rest of the search process efficient > enough that the savings of moving to single buffering does become > worthwhile. But we have a ways to go until that is the case: maybe a > 8x speedup from the recent baseline before it starts to be a limiting > factor? I'm optimistic that this level of speedup is possible, > though. > > Nathan Kurz > nate@verse.com >