lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2492) Make PulsingCodec (wrapping StandardCodec) the default codec
Date Mon, 07 Jun 2010 16:53:46 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876307#action_12876307
] 

Michael McCandless commented on LUCENE-2492:
--------------------------------------------

bq. oday it defaults to wrapped StandardCodec and 1 as docFreq cutoff. These two should be
parameters IMO.

+1

{quote}
Also, the cut-off itself should allow to also base it on #bytes consumed, and not just doc-freq.

So really the cutoff should be handled by means of extension, w/ a default impl DocFreqPulsing/Cutoff
(whatever) that lets you specify the doc-freq cutoff, with the ability for someone else to
extend and provide his own cutoff logic.
{quote}
That sounds interesting!

Though one challenge now is the codec does not write into the terms dict whether the term
was inlined or not; instead, it checks the docFreq (when reading).  So if we let a subclass
make this decision, somehow it'd have to store this bit into the terms dict (since "# bytes
consumed" isn't available at read-time).

> Make PulsingCodec (wrapping StandardCodec) the default codec
> ------------------------------------------------------------
>
>                 Key: LUCENE-2492
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2492
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>
> PulsingCodec can provides good gains, by inlining the postings into the terms dict for
rare terms.  This is especially helpful for primary key like fields, since every term is rare
and batch lookups are common (see http://chbits.blogspot.com/2010/06/lucenes-pulsingcodec-on-primary-key.html
for a simple perf test), but it should also be a gain for ordinary fields, thanks to Zipf's
law.
> I think we should make it the default....

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message