lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Created: (LUCENE-2445) Perf improvements for the DocsEnum bulk read API
Date Tue, 04 May 2010 23:16:05 GMT
Perf improvements for the DocsEnum bulk read API

                 Key: LUCENE-2445
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
            Reporter: Michael McCandless
             Fix For: 4.0

I started to work on LUCENE-2443, to create a test showing the
problems, but it turns out none of the core codecs (even sep/intblock)
ever set a non-zero offset.

So I set forth to fix sep to do so, but ran into some issues w/ the
current bulk-read API that we should fix to make it higher

  * Filtering of deleted docs should be the caller's job (saves an
    extra pass through the docs)

  * Probably docs should arrive as deltas and caller sums these up to
    get the actual docID

  * Whether to load freqs or not should be separately controllable

  * We may want to require that the int[] for docs and freqs are
    "aligned", ie the offset into each is the same

  * Maybe we should separate out a BulkDocsEnum from DocsEnum.  We can
    make it optional for codecs (ie, we can emulate BulkDocsEnum from
    the DocsEnum)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message