Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 07E6970C6 for ; Mon, 10 Oct 2011 05:16:56 +0000 (UTC) Received: (qmail 48756 invoked by uid 500); 10 Oct 2011 05:16:54 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 48684 invoked by uid 500); 10 Oct 2011 05:16:53 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 48676 invoked by uid 99); 10 Oct 2011 05:16:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2011 05:16:52 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2011 05:16:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id BDE4F3006A2 for ; Mon, 10 Oct 2011 05:16:29 +0000 (UTC) Date: Mon, 10 Oct 2011 05:16:29 +0000 (UTC) From: "Uwe Schindler (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <827668624.13757.1318223789779.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123900#comment-13123900 ] Uwe Schindler commented on LUCENE-1536: --------------------------------------- bq. Before they were not applied in the filter, but everywhere else in the query. Now they are applied once per query Sorry this is only correct for the iterator based advancing. For the filter-down-low approach they are of-course still applied. But still we should show benchmarks, that this really hurts. Because caching acceptDocs (not liveDocs!!!!) is very hard to do. Of course a chained FilteredQuery with lots of chanined filters could be simplier (only one static BitSet cached for the whole filter chain). Robert and me had more ideas how to optimize the always appliying acceptDocs case in every scorer: E.g. ConjunctionTermScorer could pass null down for all but one sub-scorer. Ideally the one that gets the liveDocs should be the one with lowest docFreq. The others don't need liveDocs, as the lowDocFreq scorer already applied them and they can never be applied to hits. We should open new issues for those optimizations. > if a filter can support random access API, we should use it > ----------------------------------------------------------- > > Key: LUCENE-1536 > URL: https://issues.apache.org/jira/browse/LUCENE-1536 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search > Affects Versions: 2.4 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, changes-yonik-uwe.patch, luceneutil.patch > > > I ran some performance tests, comparing applying a filter via > random-access API instead of current trunk's iterator API. > This was inspired by LUCENE-1476, where we realized deletions should > really be implemented just like a filter, but then in testing found > that switching deletions to iterator was a very sizable performance > hit. > Some notes on the test: > * Index is first 2M docs of Wikipedia. Test machine is Mac OS X > 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. > * I test across multiple queries. 1-X means an OR query, eg 1-4 > means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 > AND 3 AND 4. "u s" means "united states" (phrase search). > * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, > 95, 98, 99, 99.99999 (filter is non-null but all bits are set), > 100 (filter=null, control)). > * Method high means I use random-access filter API in > IndexSearcher's main loop. Method low means I use random-access > filter API down in SegmentTermDocs (just like deleted docs > today). > * Baseline (QPS) is current trunk, where filter is applied as iterator up > "high" (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org