Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D3E2D803 for ; Mon, 21 Jan 2013 13:04:16 +0000 (UTC) Received: (qmail 14915 invoked by uid 500); 21 Jan 2013 13:04:14 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 14843 invoked by uid 500); 21 Jan 2013 13:04:13 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 14830 invoked by uid 99); 21 Jan 2013 13:04:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jan 2013 13:04:13 +0000 Date: Mon, 21 Jan 2013 13:04:13 +0000 (UTC) From: "Michael McCandless (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558738#comment-13558738 ] Michael McCandless commented on LUCENE-4600: -------------------------------------------- ALL_PARENTS StandardFacetsCollector (base) vs CountingFacetsCollector (comp): {noformat} Task QPS base StdDev QPS comp StdDev Pct diff Respell 55.89 (3.2%) 55.13 (3.9%) -1.4% ( -8% - 5%) PKLookup 207.52 (1.6%) 206.95 (1.4%) -0.3% ( -3% - 2%) Wildcard 62.22 (3.2%) 62.94 (2.7%) 1.2% ( -4% - 7%) IntNRQ 17.88 (5.2%) 18.16 (5.7%) 1.6% ( -8% - 13%) Prefix3 45.56 (4.9%) 46.48 (4.1%) 2.0% ( -6% - 11%) HighSloppyPhrase 0.80 (9.7%) 0.84 (8.5%) 4.9% ( -12% - 25%) HighPhrase 13.52 (7.7%) 15.09 (8.1%) 11.6% ( -3% - 29%) LowSloppyPhrase 15.02 (3.9%) 17.15 (4.0%) 14.1% ( 5% - 22%) LowPhrase 14.14 (4.3%) 16.77 (4.9%) 18.6% ( 8% - 29%) MedSloppyPhrase 14.81 (2.6%) 18.33 (2.7%) 23.7% ( 17% - 29%) Fuzzy2 27.57 (2.6%) 34.95 (3.1%) 26.8% ( 20% - 33%) AndHighHigh 9.39 (1.6%) 11.92 (1.4%) 27.0% ( 23% - 30%) MedTerm 14.63 (2.2%) 18.89 (1.7%) 29.1% ( 24% - 33%) HighTerm 5.28 (1.8%) 7.02 (2.4%) 33.0% ( 28% - 37%) Fuzzy1 20.79 (2.1%) 27.71 (2.8%) 33.3% ( 27% - 39%) OrHighLow 4.82 (1.8%) 6.70 (2.6%) 39.1% ( 34% - 44%) OrHighMed 4.74 (1.8%) 6.61 (3.0%) 39.4% ( 34% - 44%) OrHighHigh 2.68 (1.8%) 3.77 (2.9%) 40.9% ( 35% - 46%) MedPhrase 39.21 (3.6%) 55.35 (3.6%) 41.2% ( 32% - 50%) AndHighMed 36.29 (3.5%) 51.92 (2.0%) 43.1% ( 36% - 50%) LowTerm 27.96 (3.2%) 41.47 (2.2%) 48.3% ( 41% - 55%) AndHighLow 64.36 (5.4%) 107.94 (5.7%) 67.7% ( 53% - 83%) MedSpanNear 70.17 (6.1%) 123.23 (7.4%) 75.6% ( 58% - 94%) LowSpanNear 70.35 (6.0%) 123.59 (7.1%) 75.7% ( 58% - 94%) HighSpanNear 70.35 (6.1%) 123.69 (7.8%) 75.8% ( 58% - 95%) {noformat} These are nice gains! > Explore facets aggregation during documents collection > ------------------------------------------------------ > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Shai Erera > Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with a float[] to hold scores as well, if you will aggregate them) during collection, and then at the end when you call getFacetsResults(), it makes a 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't have to tie up transient RAM (fairly small for the bit set but possibly big for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org