lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: [jira] Commented: (LUCENE-1410) PFOR implementation
Date Tue, 06 Oct 2009 21:33:03 GMT
Eks,

> 
>     [ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
] 
> 
> Eks Dev commented on LUCENE-1410:
> ---------------------------------
> 
> Mike, 
> That is definitely the way to go, distribution dependent encoding, where every Term gets
individual treatment.
>   
> Take for an example simple, but not all that rare case where Index gets sorted on some
of the indexed fields (we use it really extensively, e.g. presorted doc collection on user_rights/zip/city,
all indexed). There you get perfectly "compressible"  postings by simply managing intervals
of set bits. Updates distort this picture, but we rebuild index periodically and all gets
good again.  At the moment we load them into RAM as Filters in IntervalSets. if that would
be possible in lucene, we wouldn't bother with Filters (VInt decoding on such super dense
fields was killing us, even in RAMDirectory) ...  

You could try switching the Filter to OpenBitSet when that takes fewer bytes than SortedVIntList.

Regards,
Paul Elschot

Mime
View raw message