Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D03A1102ED for ; Wed, 28 Aug 2013 08:56:57 +0000 (UTC) Received: (qmail 25854 invoked by uid 500); 28 Aug 2013 08:56:56 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 25807 invoked by uid 500); 28 Aug 2013 08:56:55 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 25780 invoked by uid 99); 28 Aug 2013 08:56:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Aug 2013 08:56:53 +0000 Date: Wed, 28 Aug 2013 08:56:53 +0000 (UTC) From: "Han Jiang (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3069: ------------------------------ Attachment: LUCENE-3069.patch Patch, to show the impersonation hack for Pulsing format. We cannot perfectly impersonate old pulsing format yet: the old format divided metadata block as inlined bytes and wrapped bytes, so when the term dict reader reads the length of metadata block, it is actually the length of 'inlined block'... And the 'wrapped block' won't be loaded for wrapped PF. However, to introduce a new method in PostingsReaderBase doesn't seem to be a good way... > Lucene should have an entirely memory resident term dictionary > -------------------------------------------------------------- > > Key: LUCENE-3069 > URL: https://issues.apache.org/jira/browse/LUCENE-3069 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index, core/search > Affects Versions: 4.0-ALPHA > Reporter: Simon Willnauer > Assignee: Han Jiang > Labels: gsoc2013 > Fix For: 5.0, 4.5 > > Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch > > > FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org