lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-5293) Also use EliasFanoDocIdSet in CachingWrapperFilter
Date Thu, 17 Oct 2013 20:02:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798317#comment-13798317
] 

Paul Elschot edited comment on LUCENE-5293 at 10/17/13 8:01 PM:
----------------------------------------------------------------

First patch, 17 Oct 2013, quite rough, one nocommit.
The latest benchmark results for doc id sets are here: http://people.apache.org/~jpountz/doc_id_sets.html

The patch uses EliasFanoDocIdSet for caching when EliasFanoDocIdSet.sufficientlySmallerThanBitSet
returns true,
which is currently at 1/7, or at about -0.85 log10 scale in the benchmark results.
Otherwise it uses WAH8DocIdSet, the current behaviour.
Does this choice make good use of the benchmark results?

To get the number of doc ids to be put in the cache, the patch checks for the type of the
actual DocIdSet that is given, and uses FixedBitSet and OpenBitSet cardinality. (Perhaps a
similar method should be added to EliasFanoDocIdSet.)
In other cases, the patch falls back to WAH8DocIdSet.

I added a DocIdSet argument to cacheImpl(), there is a nocommit for that.

The patch also corrects a mistake in EliasFanoDocIdSet.sufficientlySmallerThanBitSet, the
arguments should be int instead of long, just like the  EliasFanoDocIdSet constructor.






was (Author: paul.elschot@xs4all.nl):
First patch, 17 Oct 2013, quite rough, one nocommit.
The latest benchmark results doc id sets results are here: http://people.apache.org/~jpountz/doc_id_sets.html

The patch uses EliasFanoDocIdSet for caching when EliasFanoDocIdSet.sufficientlySmallerThanBitSet
returns true,
which is currently at 1/7, or at about -0.85 log10 scale in the benchmark results.
Otherwise it uses WAH8DocIdSet, the current behaviour.
Does this choice make good use of the benchmark results?

To get the number of doc ids to be put in the cache, the patch checks for the type of the
actual DocIdSet that is given, and uses FixedBitSet and OpenBitSet cardinality. (Perhaps a
similar method should be added to EliasFanoDocIdSet.)
In other cases, the patch falls back to WAH8DocIdSet.

I added a DocIdSet argument to cacheImpl(), there is a nocommit for that.

The patch also corrects a mistake in EliasFanoDocIdSet.sufficientlySmallerThanBitSet, the
arguments should be int instead of long, just like the  EliasFanoDocIdSet constructor.





> Also use EliasFanoDocIdSet in CachingWrapperFilter
> --------------------------------------------------
>
>                 Key: LUCENE-5293
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5293
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: LUCENE-5293.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message