Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 69084 invoked from network); 7 Apr 2010 10:10:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Apr 2010 10:10:56 -0000 Received: (qmail 44130 invoked by uid 500); 7 Apr 2010 10:10:55 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 44092 invoked by uid 500); 7 Apr 2010 10:10:55 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 44073 invoked by uid 99); 7 Apr 2010 10:10:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Apr 2010 10:10:54 +0000 X-ASF-Spam-Status: No, hits=-1238.1 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Apr 2010 10:10:53 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8E1D3234C4AC for ; Wed, 7 Apr 2010 10:10:33 +0000 (UTC) Message-ID: <1499742455.39431270635033581.JavaMail.jira@brutus.apache.org> Date: Wed, 7 Apr 2010 10:10:33 +0000 (UTC) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-2373) Change StandardTermsDictWriter to work with streaming and append-only filesystems In-Reply-To: <1473968500.28961270594654032.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854409#action_12854409 ] Michael McCandless commented on LUCENE-2373: -------------------------------------------- I would love to make Lucene truly write once (and moreve IndexOutput.seek), but... this approach makes me a little nervous... In some environments, relying on the length of the file to be accurate might be risky: it's metadata, that can be subject to different client-side caching than the file's contents. EG on NFS I've seen issues where the file length was stale yet the file contents were not. Maybe we could offer a separate codec that takes this approach, for use on filesystems like HDFS that can't seek during write? We should refactor standard codec so that "where this long gets stored" can be easily overridden by a subclass. Or, alternatively, we could write this "index of the index" to a separate file? > Change StandardTermsDictWriter to work with streaming and append-only filesystems > --------------------------------------------------------------------------------- > > Key: LUCENE-2373 > URL: https://issues.apache.org/jira/browse/LUCENE-2373 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Andrzej Bialecki > Fix For: 3.1 > > > Since early 2.x times Lucene used a skip/seek/write trick to patch the length of the terms dict into a place near the start of the output data file. This however made it impossible to use Lucene with append-only filesystems such as HDFS. > In the post-flex trunk the following code in StandardTermsDictWriter initiates this: > {code} > // Count indexed fields up front > CodecUtil.writeHeader(out, CODEC_NAME, VERSION_CURRENT); > out.writeLong(0); // leave space for end index pointer > {code} > and completes this in close(): > {code} > out.seek(CodecUtil.headerLength(CODEC_NAME)); > out.writeLong(dirStart); > {code} > I propose to change this layout so that this pointer is stored simply at the end of the file. It's always 8 bytes long, and we known the final length of the file from Directory, so it's a single additional seek(length - 8) to read it, which is not much considering the benefits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org