lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Pohl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5052) bitset codec for off heap filters
Date Wed, 12 Jun 2013 08:43:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681049#comment-13681049
] 

Stefan Pohl commented on LUCENE-5052:
-------------------------------------

The following paper might be informative in regard to this ticket (you can even go beyond
maxdocs/8, if compared against VInt-coding):

A. Moffat and J. S. Culpepper. Hybrid Bitvector Index Compression. In Proceedings of the 12th
Australasian Document Computing Symposium (ADCS 2007), December 2007. pp 25-37.
available from: http://goanna.cs.rmit.edu.au/~e76763/publications.html

More generally, it would be nice to determine the PostingsListFormat depending on statistics
of individual terms, not only per-field.
                
> bitset codec for off heap filters
> ---------------------------------
>
>                 Key: LUCENE-5052
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5052
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/codecs
>            Reporter: Mikhail Khludnev
>              Labels: features
>             Fix For: 5.0
>
>
> Colleagues,
> When we filter we don’t care any of scoring factors i.e. norms, positions, tf, but
it should be fast. The obvious way to handle this is to decode postings list and cache it
in heap (CachingWrappingFilter, Solr’s DocSet). Both of consuming a heap and decoding as
well are expensive. 
> Let’s write a posting list as a bitset, if df is greater than segment's maxdocs/8 
(what about skiplists? and overall performance?). 
> Beside of the codec implementation, the trickiest part to me is to design API for this.
How we can let the app know that a term query don’t need to be cached in heap, but can be
held as an mmaped bitset?
> WDYT?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message