lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2492) Make PulsingCodec (wrapping StandardCodec) the default codec
Date Mon, 07 Jun 2010 11:04:41 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876201#action_12876201
] 

Shai Erera commented on LUCENE-2492:
------------------------------------

+1 !

I think though, we should make PC more extensible. Today it defaults to wrapped StandardCodec
and 1 as docFreq cutoff. These two should be parameters IMO. Also, the cut-off itself should
allow to also base it on #bytes consumed, and not just doc-freq.

So really the cutoff should be handled by means of extension, w/ a default impl DocFreqPulsing/Cutoff
(whatever) that lets you specify the doc-freq cutoff, with the ability for someone else to
extend and provide his own cutoff logic.

I'm still new to Codecs, so perhaps this doesn't make much sense.

> Make PulsingCodec (wrapping StandardCodec) the default codec
> ------------------------------------------------------------
>
>                 Key: LUCENE-2492
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2492
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>
> PulsingCodec can provides good gains, by inlining the postings into the terms dict for
rare terms.  This is especially helpful for primary key like fields, since every term is rare
and batch lookups are common (see http://chbits.blogspot.com/2010/06/lucenes-pulsingcodec-on-primary-key.html
for a simple perf test), but it should also be a gain for ordinary fields, thanks to Zipf's
law.
> I think we should make it the default....

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message