incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Posting codecs
Date Sat, 20 Sep 2008 02:51:17 GMT
[This is a continuation of a thread from the KinoSearch mailing list  
entitled "Queries with large number of hits."]

On Sep 19, 2008, at 11:25 AM, Nathan Kurz wrote:

> Thanks for your quick research and improvements, Marvin.  You prompted
> me to run a few quick tests as well:
> kinosearch_instream/perl$ search "the OR and OR of OR for" (ten times)
> kinosearch_instream/perl$ opreport -alt2 */
> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
> Profiling through timer interrupt
> samples  cum. samples  %        cum. %     symbol name
> 190      190           23.8394  23.8394    kino_InStream_read_c32
> 112      302           14.0527  37.8921    kino_ScorePost_read_record
> 68       370            8.5320  46.4241    kino_ScorerDocQ_top_next
> 55       425            6.9009  53.3250    kino_TermScorer_next
> 51       476            6.3990  59.7240    kino_InStream_read_u8
> 45       521            5.6462  65.3701    kino_SegPList_next
> 31       552            3.8896  69.2597    __i686.get_pc_thunk.bx
> 30       582            3.7641  73.0238    advance_after_current
> 29       611            3.6386  76.6625    anonymous symbol from  
> section .plt
> 25       636            3.1368  79.7992    kino_MemMan_wrapped_realloc
> 22       658            2.7604  82.5596    kino_ScorePostScorer_tally
> 17       675            2.1330  84.6926    kino_ORScorer_tally
> "opannotate --source */" is also useful to glance at, as
> is adding the '-c' flag to opreport to see where the calls are coming
> from and where the functions are spending their time internally.

> The main thing that jumped out is that the function call to
> Instream_read_c32 is killing us.  I don't see any way to have this
> remain a per-int function call and still get good performance.

Let's set a goal of implementing PForDelta, and figure out how to get  
there from here.

> You need to figure out some to decode the entire posting with fewer
> function calls, and this is where the bulk of them are coming from.
> I'd suggest having the Store level return an undecoded raw posting,
> and let the Posting class decode the parts it wants.  That way the
> VByte code can be macros that work on a local buffer in a tight loop.

It's a little tricky to give anything other than the InStream object  
itself direct access to the raw buffer.  It's inevitable that the  
bytes that form a compressed integer will straddle a buffer boundary  
from time to time.  InStream_read_c32 takes this into account by  
calling refill() when necessary, but how do you factor that in when  
working externally?  MathUtils.h already has DECODE_C32 and ENCODE_C32  
macros and they're pretty gnarly, even though they assume that the  
buffer they're pointed at holds all the needed bytes.

Alternatively, if you decide to guarantee that any time you supply a  
buffer to an outsider that it has all the necessary bytes for the C32,  
the cost of scanning the stream to find the end of the record bites you.

Another theoretical possibility would be to store the size of each  
undecoded posting in the index, perhaps as a prepended C32 -- similar  
to how string data is stored.  However, that would take extra space  
and extra time to decode, and would be prohibitively expensive for  
simple Posting formats like MatchPosting.

> The second thing that jumps out is that decompressing VBytes is
> expensive.  P4Delta might be a significant advantage, or perhaps there
> are ways to optimize the decompression with the existing scheme.

In that study, compressed integers fared the worst of all the  
competitors.  We'll still need them for lots of other things, but  
let's try to get PForDelta happening in the postings data.

> The last thing (sort of a non-thing) is that to my surprise the
> double/triple buffering of the Posting data doesn't seem to have much
> of a negative effect.  I still think it's worth trying to avoid this,
> but it's barely on the current radar.

The double/triple buffering may not cause a slowdown, but it's worth  
going with mmap for the backing buffer on InStream simply for memory  
footprint.  Since all index files are read-only once committed, they  
can be shared.  That means if you have multiple processes with  
multiple IndexReaders all reading the same file, they can all  
effectively use the same backing buffer, and we'll let the OS manage  
page swapping.

> Did you also recreate an index without the position information, or
> are these times based on just skipping over the position info in the
> index?

I recreated the index, using a new Schema -- there was no other  
choice.  MatchPosting as presently implemented expects to see  
<doc,freq> for each posting.  (It really ought to be just <doc>, but I  
hacked it so we could get these tests going.)  If you give it  
ScorePosting data <doc,freq,boost,positions>, it will misinterpret the  
boost and positions as the next docs's data, get out of sync, return  
wrong results, and possibly crash with an EOF error.

Marvin Humphrey
Rectangular Research

View raw message