lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otis Gospodnetic (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-584) Decouple Filter from BitSet
Date Thu, 05 Apr 2007 21:33:32 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Otis Gospodnetic updated LUCENE-584:
------------------------------------

    Attachment: bench-diff.txt

Perhaps I did something wrong with the benchmark, but I didn't get any speed-up when using
searcher.match(Query, MatchCollector) vs. searcher.search(Query, HitCollector).

Here are the benchmark numbers (50000 queries with each), HitCollector first, MatchCollector
second:

HITCOLLECTOR:

     [java] ------------> Report Sum By (any) Name (11 about 41 out of 41)
     [java] Operation           round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec
   avgUsedMem    avgTotalMem
     [java] Rounds_4                0  10  10        1       808020        787.5    1,026.04
    7,217,624     17,780,736
     [java] Populate -  -  -  -  -  - - - - - -  -   4 -  -  - 2003 -  -   129.9 -  -  61.67
-   9,938,986 -   13,821,952
     [java] CreateIndex             -   -   -        4            1          4.4        0.91
    3,937,522     10,916,864
     [java] MAddDocs_2000 -  -  -   - - - - - -  -   4 -  -  - 2000 -  -   138.1 -  -  57.92
-   9,368,584 -   13,821,952
     [java] Optimize                -   -   -        4            1          1.4        2.83
    9,938,218     13,821,952
     [java] CloseIndex -  -  -  -   - - - - - -  -   4 -  -  -  - 1 -  - 2,000.0 -  -   0.00
-   9,938,986 -   13,821,952
     [java] OpenReader              -   -   -        4            1         24.0        0.17
    9,957,592     13,821,952
     [java] SearchSameRdr_50000 -   - - - - - -  -   4 -  -   50000 -  - 1,070.3 -  - 186.86
-  10,500,146 -   13,821,952
     [java] CloseReader             -   -   -        4            1      4,000.0        0.00
    9,059,756     13,821,952
     [java] WarmNewRdr_50 -  -  -   - - - - - -  -   4 -  -  100000 -   16,237.7 -  -  24.63
-   9,060,268 -   13,821,952
     [java] SrchNewRdr_50000        -   -   -        4        50000        265.9      752.02
   10,800,006     13,821,952


     [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 41)
     [java] Operation     round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec   
avgUsedMem    avgTotalMem
     [java] MAddDocs_2000     0  10  10        1         2000         94.6       21.15   
 7,844,112     10,407,936
     [java] MAddDocs_2000 -   1 100  10 -  -   1 -  -  - 2000 -  -   136.7 -  -  14.63 - 
 8,968,144 -   11,309,056
     [java] MAddDocs_2000     2  10 100        1         2000        173.2       11.55   
10,528,264     15,740,928
     [java] MAddDocs_2000 -   3 100 100 -  -   1 -  -  - 2000 -  -   188.7 -  -  10.60 - 
10,133,816 -   17,829,888


MATCHCOLLECTOR:


     [java] ------------> Report Sum By (any) Name (11 about 41 out of 41)
     [java] Operation           round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec
   avgUsedMem    avgTotalMem
     [java] Rounds_4                0  10  10        1       808020        781.0    1,034.62
   10,566,608     15,859,712
     [java] Populate -  -  -  -  -  - - - - - -  -   4 -  -  - 2003 -  -   130.9 -  -  61.23
-  10,963,452 -   14,806,016
     [java] CreateIndex             -   -   -        4            1         33.9        0.12
    3,616,570     11,020,288
     [java] MAddDocs_2000 -  -  -   - - - - - -  -   4 -  -  - 2000 -  -   137.3 -  -  58.29
-  10,445,568 -   14,806,016
     [java] Optimize                -   -   -        4            1          1.4        2.82
   10,979,398     14,806,016
     [java] CloseIndex -  -  -  -   - - - - - -  -   4 -  -  -  - 1 -  - 2,000.0 -  -   0.00
-  10,963,452 -   14,806,016
     [java] OpenReader              -   -   -        4            1         22.0        0.18
   10,982,058     14,806,016
     [java] SearchSameRdr_50000 -   - - - - - -  -   4 -  -   50000 -  - 1,064.7 -  - 187.84
-  11,060,036 -   14,806,016
     [java] CloseReader             -   -   -        4            1      4,000.0        0.00
   10,353,206     14,806,016
     [java] WarmNewRdr_50 -  -  -   - - - - - -  -   4 -  -  100000 -   16,419.0 -  -  24.36
-  10,431,062 -   14,806,016
     [java] SrchNewRdr_50000        -   -   -        4        50000        263.0      760.34
   11,912,358     14,806,016


     [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 41)
     [java] Operation     round mrg buf   runCnt   recsPerRun        rec/s  elapsedSec   
avgUsedMem    avgTotalMem
     [java] MAddDocs_2000     0  10  10        1         2000         92.2       21.69   
 7,844,112     10,407,936
     [java] MAddDocs_2000 -   1 100  10 -  -   1 -  -  - 2000 -  -   136.6 -  -  14.64 - 
 7,720,352 -   10,407,936
     [java] MAddDocs_2000     2  10 100        1         2000        167.8       11.92   
11,325,952     17,571,840
     [java] MAddDocs_2000 -   3 100 100 -  -   1 -  -  - 2000 -  -   199.3 -  -  10.03 - 
14,891,856 -   20,836,352



This is what I did for the benchmark.  I used Doron's handy conf/benchmark.
I added a new .alg based on micro-standard.alg, here's the diff:


$ diff conf/micro-standard.alg conf/matcher-micro-standard.alg 
60c60
<     { "SearchSameRdr" Search > : 50000
---
>     { "SearchSameRdr" SearchMatch > : 50000
65c65
<     { "SrchNewRdr" Search > : 50000
---
>     { "SrchNewRdr" SearchMatch > : 50000


Then I added 2 new Tasks for benchamrking the Matcher (searcher.search(Query, MatchCollector))
and modified the ReadTask to call searcher.search(Query, HitCollector) instead of the method
to get Hits.

I commented out all search results traversal and doc retrieval, as I didn't care to measure
that.


> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: https://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: bench-diff.txt, BitsMatcher.java, Filter-20060628.patch, HitCollector-20060628.patch,
IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java, Matcher20070226.patch, Scorer-20060628.patch,
Searchable-20060628.patch, Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java,
TestSortedVIntList.java
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract interface, instead
of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's privileges, only
a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of memory. It
would be desirable to have an alternative BitSet implementation with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was obviously not
designed for that purpose.
> That's why I propose to use an interface instead. The default implementation could still
delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message