Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 41564 invoked from network); 25 Oct 2010 09:39:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Oct 2010 09:39:47 -0000 Received: (qmail 41581 invoked by uid 500); 25 Oct 2010 09:39:47 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 41012 invoked by uid 500); 25 Oct 2010 09:39:44 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 41005 invoked by uid 99); 25 Oct 2010 09:39:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Oct 2010 09:39:43 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Oct 2010 09:39:41 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9P9dJs0013028 for ; Mon, 25 Oct 2010 09:39:19 GMT Message-ID: <16713344.56611287999559354.JavaMail.jira@thor> Date: Mon, 25 Oct 2010 05:39:19 -0400 (EDT) From: "Michael McCandless (JIRA)" To: dev@lucene.apache.org Subject: [jira] Created: (LUCENE-2722) Sep codec should store less in terms dict MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org Sep codec should store less in terms dict ----------------------------------------- Key: LUCENE-2722 URL: https://issues.apache.org/jira/browse/LUCENE-2722 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 I'm working on improving Lucene's performance with int block codecs (FOR/PFOR), but in early perf testing I found that these codecs cause a big perf hit to those MTQs that need to scan many terms but don't end up accepting many of those terms (eg fuzzy, wildcard, regexp). This is because sep codec stores much more in the terms dict, since each file is separate, ie seek points for each of doc, frq, pos, pyl, skp files. So I'd like to shift these seek points to instead be stored in the doc file, except for the doc seek point itself. Since a given query will always need to seek to the doc file, this does not add an extra seek. But it saves tons of vInt decodes for the next/seke intensive MTQs... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org