Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 11386 invoked from network); 12 Aug 2004 18:44:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 12 Aug 2004 18:44:08 -0000 Received: (qmail 53049 invoked by uid 500); 12 Aug 2004 18:44:06 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 52925 invoked by uid 500); 12 Aug 2004 18:44:05 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 52911 invoked by uid 99); 12 Aug 2004 18:44:05 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received: from [207.217.120.253] (HELO audiogram.mail.pas.earthlink.net) (207.217.120.253) by apache.org (qpsmtpd/0.27.1) with ESMTP; Thu, 12 Aug 2004 11:44:01 -0700 Received: from [65.174.70.194] (helo=[192.168.12.20]) by audiogram.mail.pas.earthlink.net with asmtp (Exim 4.34) id 1BvKYS-0008If-Rw for lucene-dev@jakarta.apache.org; Thu, 12 Aug 2004 11:44:01 -0700 Message-ID: <411BBA91.5030602@earthlink.net> Date: Thu, 12 Aug 2004 12:44:33 -0600 From: Dmitry Serebrennikov User-Agent: Mozilla Thunderbird 0.7.1 (Windows/20040626) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Developers List Subject: Re: optimized disk usage when creating a compound index References: <26995588$109211885141186943d4a6c2.64920639@config18.schlund.de> In-Reply-To: <26995588$109211885141186943d4a6c2.64920639@config18.schlund.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-ELNK-Trace: b9a8ec9b68c23176d780f4a490ca69564776905774d2ac4bd83608f4372da4d73c6c18960b0f14b9350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 65.174.70.194 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi Christoph, I agree that your approach achieves better disk usage than deleting segments as they are being merged into the compound file, chiefly because most indexes have one or two large files and the rest are small. I have not reviewed your latest code yet (it's a bit hard without a checked out working copy of the CVS image, btw, could you post diffs so others can more readily review them?), but from what you are describing here's what I think. It sounds like it would work, but it also sounds a bit cludgy. The main thing that I don't like is that we are now inventing another way of doing what Lucene already does - maintaining index integrity across filesystem changes and safely deleting unneeded files. I'm thinking that Lucene already has a way of switching to the new segments file, but we are proposing something similar with renaming of the cfs file. A note on the norms with .f and .s files - this is getting complicated... One note on SegmentReader.files() - we should probably have the "tmp" extension listed here so we can cleanup segments that failed to create a cfs file. Here's an alternative idea that leverages existing Lucene segments file: Could we simply create compound file in a new segment? This way we don't have to invent the "tmp" file or change anything else about the files (like the norms stuff). All in all, I haven't really been involved in Lucene codebase closely enough lately, and this is starting to impact things like norms, locks, and merging, so that I don't feel qualified to make the final call on this. I'd like to hear what Doug and others think. From my point of view, I don't really see anything *wrong* with the latest set of changes (just need to add "tmp" file to SegmentReader.files()), but it doesn't strike me as an obviously *right* way to do this either yet. So I'll change my vote to a 0 and see what others think. :) 0. Cheers. Dmitry. --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org