lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-2722) Sep codec should store less in terms dict
Date Mon, 25 Oct 2010 09:39:19 GMT
Sep codec should store less in terms dict
-----------------------------------------

                 Key: LUCENE-2722
                 URL: https://issues.apache.org/jira/browse/LUCENE-2722
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 4.0


I'm working on improving Lucene's performance with int block codecs
(FOR/PFOR), but in early perf testing I found that these codecs cause
a big perf hit to those MTQs that need to scan many terms but don't
end up accepting many of those terms (eg fuzzy, wildcard, regexp).

This is because sep codec stores much more in the terms dict, since
each file is separate, ie seek points for each of doc, frq, pos, pyl,
skp files.

So I'd like to shift these seek points to instead be stored in the doc
file, except for the doc seek point itself.  Since a given query will
always need to seek to the doc file, this does not add an extra seek.
But it saves tons of vInt decodes for the next/seke intensive MTQs...


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message