Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC1A196D9 for ; Sat, 15 Oct 2011 14:47:34 +0000 (UTC) Received: (qmail 8815 invoked by uid 500); 15 Oct 2011 14:47:33 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 8754 invoked by uid 500); 15 Oct 2011 14:47:33 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 8747 invoked by uid 99); 15 Oct 2011 14:47:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Oct 2011 14:47:33 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Oct 2011 14:47:32 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 0EB95309F4F for ; Sat, 15 Oct 2011 14:47:12 +0000 (UTC) Date: Sat, 15 Oct 2011 14:47:12 +0000 (UTC) From: "Michael McCandless (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <719306658.17318.1318690032061.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128209#comment-13128209 ] Michael McCandless commented on LUCENE-1536: -------------------------------------------- bq. Mike: "Normal Filters" always respect acceptDocs, as they generally use the acceptDocs to pass them to IndexReader. OK this makes sense (that many filter impls will need to pass through the accept docs down to eventual enums). So I think the API change is good! bq. The optimizations for CachingWrapperFilter to also cache acceptDocs should maybe done in a separate issue. We have currently no slowdown, as its still faster than before. Improvements can come later. OK we can add it back under a new issue after committing this; but I think it's important to not lose this (CachingWrapperFilter today is able to pre-AND the liveDocs and cache that). It sounds like the cache key just has to become a pair of reader and identity(acceptDocs), but we should only cache when acceptDocs == reader.getLiveDocs, else we can easily over-cache if in the future we pass "more interesting" acceptDocs down. Great! > if a filter can support random access API, we should use it > ----------------------------------------------------------- > > Key: LUCENE-1536 > URL: https://issues.apache.org/jira/browse/LUCENE-1536 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search > Affects Versions: 2.4 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch > > > I ran some performance tests, comparing applying a filter via > random-access API instead of current trunk's iterator API. > This was inspired by LUCENE-1476, where we realized deletions should > really be implemented just like a filter, but then in testing found > that switching deletions to iterator was a very sizable performance > hit. > Some notes on the test: > * Index is first 2M docs of Wikipedia. Test machine is Mac OS X > 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. > * I test across multiple queries. 1-X means an OR query, eg 1-4 > means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 > AND 3 AND 4. "u s" means "united states" (phrase search). > * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, > 95, 98, 99, 99.99999 (filter is non-null but all bits are set), > 100 (filter=null, control)). > * Method high means I use random-access filter API in > IndexSearcher's main loop. Method low means I use random-access > filter API down in SegmentTermDocs (just like deleted docs > today). > * Baseline (QPS) is current trunk, where filter is applied as iterator up > "high" (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org