Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 77653 invoked from network); 17 Feb 2004 17:38:17 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 17 Feb 2004 17:38:17 -0000 Received: (qmail 31025 invoked by uid 500); 17 Feb 2004 17:37:53 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 30958 invoked by uid 500); 17 Feb 2004 17:37:52 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 30875 invoked from network); 17 Feb 2004 17:37:52 -0000 Received: from unknown (HELO sccrmhc13.comcast.net) (204.127.202.64) by daedalus.apache.org with SMTP; 17 Feb 2004 17:37:52 -0000 Received: from apache.org (c-24-5-145-151.client.comcast.net[24.5.145.151]) by comcast.net (sccrmhc13) with ESMTP id <20040217173754016003ai01e>; Tue, 17 Feb 2004 17:37:55 +0000 Message-ID: <40325170.7050601@apache.org> Date: Tue, 17 Feb 2004 09:37:52 -0800 From: Doug Cutting User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Developers List Subject: Re: Dmitry's Term Vector stuff, plus some References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Grant Ingersoll wrote: > Was wondering if you consider your comments on the Term vector stuff to be a show stopper or not? There hasn't been much response to your questions, so I wanted to bring it up again, as I do not want to see this go the way of the last attempt. I proposed three changes: 1. add a format version number to the new file formats, so that they can be altered back-compatibly; 2. use a more compressed format for term vectors; 3. don't read positional information unless it is asked for. I'd like to see all of these fixed before a 1.4 release. So which should be fixed before things are first committed? That's a tricky question. Once something's committed it's hard to remove, and it's also hard to be sure that any more work will be done on it. Thus it's safest to only commit things that are just about release-ready. Exceptions can be made when a developer can commit to continued work. Are you committed to completing these changes in a timely manner, e.g., in the next few months? I think change (1) is essential before anything is committed, to avoid breaking folks when the format does change. However, if (2) is addressed before the 1.4 release, then there will be back-compatibility code in the 1.4 release, in order to be able to read the current format, even though it was never released. That would be unfortunate. So it would be best if both (1) and (2) were fixed before things are first committed. As for (3), it could probably wait a bit. Am I setting the bar too high here? I really appreciate that you've done all this work, and I'm eager to get it committed. This is a very sought-after feature. But I don't want to commit something that's not quite ready. Perhaps you feel you've done your share already, and want others to pick up the slack, fixing things like those named above. If that's the case then perhaps we should go ahead and commit your changes as-is, and hope that others polish things a bit before a 1.4 release. I'd prefer not to operate that way, but that might be our only option. Doug --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org