Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 71756 invoked from network); 18 Mar 2010 09:22:53 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Mar 2010 09:22:53 -0000 Received: (qmail 18575 invoked by uid 500); 18 Mar 2010 09:22:52 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 18244 invoked by uid 500); 18 Mar 2010 09:22:52 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 17984 invoked by uid 99); 18 Mar 2010 09:22:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Mar 2010 09:22:51 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Mar 2010 09:22:48 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id C7F21234C4C3 for ; Thu, 18 Mar 2010 09:22:27 +0000 (UTC) Message-ID: <728155768.337361268904147818.JavaMail.jira@brutus.apache.org> Date: Thu, 18 Mar 2010 09:22:27 +0000 (UTC) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-2329) Use parallel arrays instead of PostingList objects In-Reply-To: <801047767.334861268891907311.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846807#action_12846807 ] Michael McCandless commented on LUCENE-2329: -------------------------------------------- This would be great! But, note that term vectors today do not store the term char[] again -- they piggyback on the term char[] already stored for the postings. Though, I believe they store "int textStart" (increments by term length per unique term), which is less compact than the termID would be (increments +1 per unique term), so if eg we someday use packed ints we'd be more RAM efficient by storing termIDs... > Use parallel arrays instead of PostingList objects > -------------------------------------------------- > > Key: LUCENE-2329 > URL: https://issues.apache.org/jira/browse/LUCENE-2329 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael Busch > Assignee: Michael Busch > Priority: Minor > Fix For: 3.1 > > > This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324. > In order to avoid having very many long-living PostingList objects in TermsHashPerField we want to switch to parallel arrays. The termsHash will simply be a int[] which maps each term to dense termIDs. > All data that the PostingList classes currently hold will then we placed in parallel arrays, where the termID is the index into the arrays. This will avoid the need for object pooling, will remove the overhead of object initialization and garbage collection. Especially garbage collection should benefit significantly when the JVM runs out of memory, because in such a situation the gc mark times can get very long if there is a big number of long-living objects in memory. > Another benefit could be to build more efficient TermVectors. We could avoid the need of having to store the term string per document in the TermVector. Instead we could just store the segment-wide termIDs. This would reduce the size and also make it easier to implement efficient algorithms that use TermVectors, because no term mapping across documents in a segment would be necessary. Though this improvement we can make with a separate jira issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org