Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D320BDF84 for ; Mon, 11 Feb 2013 16:37:15 +0000 (UTC) Received: (qmail 78521 invoked by uid 500); 11 Feb 2013 16:37:14 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 78436 invoked by uid 500); 11 Feb 2013 16:37:14 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 78375 invoked by uid 99); 11 Feb 2013 16:37:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2013 16:37:14 +0000 Date: Mon, 11 Feb 2013 16:37:14 +0000 (UTC) From: "Robert Muir (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (LUCENE-4771) Query-time join collectors could maybe be more efficient MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Robert Muir created LUCENE-4771: ----------------------------------- Summary: Query-time join collectors could maybe be more efficient Key: LUCENE-4771 URL: https://issues.apache.org/jira/browse/LUCENE-4771 Project: Lucene - Core Issue Type: Improvement Components: modules/join Reporter: Robert Muir I was looking @ these collectors on LUCENE-4765 and I noticed: * SingleValued collector (SV) pulls FieldCache.getTerms and adds the bytes to a bytesrefhash per-collect. * MultiValued collector (MV) pulls FieldCache.getDocTermsOrds, but doesnt use the ords, just looks up each value and adds the bytes per-collect. I think instead its worth investigating if SV should use getTermsIndex, and both collectors just collect-up their per-segment ords in something like a BitSet[maxOrd]. When asked for the terms at the end in getCollectorTerms(), they could merge these into one BytesRefHash. Of course, if you are going to turn around and execute the query against the same searcher anyway (is this the typical case?), this could even be more efficient: No need to hash or instantiate all the terms in memory, we could do postpone the lookups to SeekingTermSetTermsEnum.accept()/nextSeekTerm() i think... somehow :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org