lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-2445) Perf improvements for the DocsEnum bulk read API
Date Tue, 04 May 2010 23:16:05 GMT
Perf improvements for the DocsEnum bulk read API
------------------------------------------------

                 Key: LUCENE-2445
                 URL: https://issues.apache.org/jira/browse/LUCENE-2445
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
            Reporter: Michael McCandless
             Fix For: 4.0


I started to work on LUCENE-2443, to create a test showing the
problems, but it turns out none of the core codecs (even sep/intblock)
ever set a non-zero offset.

So I set forth to fix sep to do so, but ran into some issues w/ the
current bulk-read API that we should fix to make it higher
performance:

  * Filtering of deleted docs should be the caller's job (saves an
    extra pass through the docs)

  * Probably docs should arrive as deltas and caller sums these up to
    get the actual docID

  * Whether to load freqs or not should be separately controllable

  * We may want to require that the int[] for docs and freqs are
    "aligned", ie the offset into each is the same

  * Maybe we should separate out a BulkDocsEnum from DocsEnum.  We can
    make it optional for codecs (ie, we can emulate BulkDocsEnum from
    the DocsEnum)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message