lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: RangeFilter performance problem using MultiReader
Date Sat, 11 Apr 2009 08:31:42 GMT
Ah,

Your test code shows why you do not see a speed improve with 2.9:
The speed improve in 2.9 is only visible for executing real searches and not
getDocIdSet alone on the big MultiReader. The 2.9 search algorithm
internally executes getDocIdSet not on the complete index (like you), it
executes it for each sub-index and each segment of these subindexes
separate. You code executes the filter on the whole index. This is not
faster in 2.9.

To compare speed, please use real search code (Searcher.search())!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Raf [mailto:r.ventaglio@gmail.com]
> Sent: Saturday, April 11, 2009 9:07 AM
> To: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
> 
> Ok, here you can find some details about my tests:
> 
> *MultiReader creation*
> 
> IndexReader subReader;
> List<IndexReader> subReaders = new ArrayList<IndexReader>();
> for (Directory dir : this.directories) {
>      try {
>           subReader = IndexReader.open(dir, true);
>           subReaders.add(subReader);
>          } catch (...) {
>            ... ... ...
>          }
>      }
>     this.reader = new MultiReader(subReaders.toArray(new IndexReader[]
> {}));
> 
> (where *this.directories* is a List<Directory> containing all my index
> directories).
> 
> *RangeFilter test*
> 
> @Test
>     public void testRangeFilter() throws IOException, ParseException {
> 
>         IndexManager im = SearchObjectsFactory.getIndexManager();
>         IndexReader reader = im.getReader();
>         long timer;
>         DocIdSet docIdSet;
>         Filter filter;
>         logger.info("Num docs: " + reader.numDocs());
> 
>         logger.info("Before creating filter...");
>         timer = System.currentTimeMillis();
>         filter = new RangeFilter("date_doc", "20081001000000",
> "20090131235959", true, true);
>         logger.info("After creating filter..." +
> (System.currentTimeMillis()
> - timer));
> 
>         logger.info("Before reading idSet...");
>         timer = System.currentTimeMillis();
>         docIdSet = filter.getDocIdSet(reader);
>         logger.info("After reading idSet..." + ((OpenBitSet)
> docIdSet).cardinality() + " " + (System.currentTimeMillis() - timer));
> 
>         logger.info("Before reading idSet...");
>         timer = System.currentTimeMillis();
>         docIdSet = filter.getDocIdSet(reader);
>         logger.info("After reading idSet..." + ((OpenBitSet)
> docIdSet).cardinality() + " " + (System.currentTimeMillis() - timer));
> 
>         logger.info("Before reading idSet...");
>         timer = System.currentTimeMillis();
>         docIdSet = filter.getDocIdSet(reader);
>         logger.info("After reading idSet..." + ((OpenBitSet)
> docIdSet).cardinality() + " " + (System.currentTimeMillis() - timer));
>     }
> 
> *
> Test *results*   (Num docs = 2,940,738)
> 
> 1 Original index (12 collections * 6 months = 72 indexes)*
> 
> 1a Range [20090101000000 - 20090131235959] --> 379,560 docs
>      2,274 ms     1,477 ms     1,283 ms
> 
> 1b Range [20081201000000 - 20090131235959] --> 974,754 docs
>      4,489 ms     3,333 ms     3,390 ms
> 
> 1c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
>      8,482 ms     7,471 ms     7,424 ms
> 
> 
> ***2Consolidated index (1 index)*
> 
> 2a Range [20090101000000 - 20090131235959] --> 379,560 docs
>      492 ms     116 ms     83 ms
> 
> 2b Range [20081201000000 - 20090131235959] --> 974,754 docs
>      640 ms     159 ms     138 ms
> 
> 2c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
>      817 ms     322 ms    295 ms
> 
> 
> The field on which I am applying the RangeFilter is a date field and it
> has
> 299,622 unique terms.
> 
> Thanks,
> Raf
> 
> 
> On Fri, Apr 10, 2009 at 7:54 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
> 
> > <cut>
> > Hmmm, interesting!
> >
> > Can you provide more details about your tests?  EG the code fragment
> > showing your query, the creation of the MultiReader, how you run the
> > search, etc.?
> >
> > Is the field that you're applying the RangeFilter on highly unique or
> > rather redundant?
> >
> > Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message