lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <>
Subject Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
Date Tue, 10 Apr 2007 15:41:29 GMT

If I remember well, the last time we profiled search with "high density"  OR queries  scoring
was taking up to 30% of the time. This was a 8Mio collection of short documents fitting comfortably
in RAM. So I am sure disabling scoring in some cases could bring us something. 

I am not all that familiar with scoring inner workings to stand 100% behind this statement,
so please take it with some healthy reserve.

But anyhow, with Matcher in place, we have at least a chance to prove it brings something
for this scenario. For Filtering case it brings definitely a lot. 

on the other note, 
Paul, would it be possible/easy to have something like. It looks easy to add it, but I may
be missing something: 
BooleanQuery.add(Matcher mtr,
    BooleanClause.Occur occur)

----- Original Message ----
From: Otis Gospodnetic (JIRA) <>
Sent: Tuesday, 10 April, 2007 5:11:32 PM
Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet


Otis Gospodnetic commented on LUCENE-584:

Ah, too bad. :(
Last time I benchmarked Lucene searching on Sun's Niagara vs. non-massive Intel boxes, Intel
boxes with Linux on them actually won, and my impression was that this was due to Niagara's
weak FPU (a known weakness in Niagara, I believe).  Thus, I thought, if we could just skip
scoring and various floating point calculations, we'd see better performance, esp. on Niagara

Paul, when you say "fastest cache", what exactly are you referring to?  The Niagara I tested
things on had 32GB of RAM, and I gave the JVM 20+GB, so at least the JVM had plenty of RAM
to work with.

> Decouple Filter from BitSet
> ---------------------------
>                 Key: LUCENE-584
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, bench-diff.txt,, Filter-20060628.patch,
HitCollector-20060628.patch, IndexSearcher-20060628.patch,,,
Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch,
> {code}
> package;
> public abstract class Filter implements 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract interface, instead
of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's privileges, only
a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It
would be desirable to have an alternative BitSet implementation with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not
designed for that purpose.
> That's why I propose to use an interface instead. The default implementation could still
delegate to =java.util.BitSet=.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today*

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message