lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raf <r.ventag...@gmail.com>
Subject Re: RangeFilter performance problem using MultiReader
Date Sat, 11 Apr 2009 07:06:33 GMT
Ok, here you can find some details about my tests:

*MultiReader creation*

IndexReader subReader;
List<IndexReader> subReaders = new ArrayList<IndexReader>();
for (Directory dir : this.directories) {
     try {
          subReader = IndexReader.open(dir, true);
          subReaders.add(subReader);
         } catch (...) {
           ... ... ...
         }
     }
    this.reader = new MultiReader(subReaders.toArray(new IndexReader[] {}));

(where *this.directories* is a List<Directory> containing all my index
directories).

*RangeFilter test*

@Test
    public void testRangeFilter() throws IOException, ParseException {

        IndexManager im = SearchObjectsFactory.getIndexManager();
        IndexReader reader = im.getReader();
        long timer;
        DocIdSet docIdSet;
        Filter filter;
        logger.info("Num docs: " + reader.numDocs());

        logger.info("Before creating filter...");
        timer = System.currentTimeMillis();
        filter = new RangeFilter("date_doc", "20081001000000",
"20090131235959", true, true);
        logger.info("After creating filter..." + (System.currentTimeMillis()
- timer));

        logger.info("Before reading idSet...");
        timer = System.currentTimeMillis();
        docIdSet = filter.getDocIdSet(reader);
        logger.info("After reading idSet..." + ((OpenBitSet)
docIdSet).cardinality() + " " + (System.currentTimeMillis() - timer));

        logger.info("Before reading idSet...");
        timer = System.currentTimeMillis();
        docIdSet = filter.getDocIdSet(reader);
        logger.info("After reading idSet..." + ((OpenBitSet)
docIdSet).cardinality() + " " + (System.currentTimeMillis() - timer));

        logger.info("Before reading idSet...");
        timer = System.currentTimeMillis();
        docIdSet = filter.getDocIdSet(reader);
        logger.info("After reading idSet..." + ((OpenBitSet)
docIdSet).cardinality() + " " + (System.currentTimeMillis() - timer));
    }

*
Test *results*   (Num docs = 2,940,738)

1 Original index (12 collections * 6 months = 72 indexes)*

1a Range [20090101000000 - 20090131235959] --> 379,560 docs
     2,274 ms     1,477 ms     1,283 ms

1b Range [20081201000000 - 20090131235959] --> 974,754 docs
     4,489 ms     3,333 ms     3,390 ms

1c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
     8,482 ms     7,471 ms     7,424 ms


***2Consolidated index (1 index)*

2a Range [20090101000000 - 20090131235959] --> 379,560 docs
     492 ms     116 ms     83 ms

2b Range [20081201000000 - 20090131235959] --> 974,754 docs
     640 ms     159 ms     138 ms

2c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
     817 ms     322 ms    295 ms


The field on which I am applying the RangeFilter is a date field and it has
299,622 unique terms.

Thanks,
Raf


On Fri, Apr 10, 2009 at 7:54 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> <cut>
> Hmmm, interesting!
>
> Can you provide more details about your tests?  EG the code fragment
> showing your query, the creation of the MultiReader, how you run the
> search, etc.?
>
> Is the field that you're applying the RangeFilter on highly unique or
> rather redundant?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message