Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 86926 invoked from network); 17 Feb 2004 21:03:47 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 17 Feb 2004 21:03:47 -0000 Received: (qmail 47058 invoked by uid 500); 17 Feb 2004 21:03:36 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 46779 invoked by uid 500); 17 Feb 2004 21:03:34 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 46764 invoked from network); 17 Feb 2004 21:03:33 -0000 Received: from unknown (HELO sccrmhc13.comcast.net) (204.127.202.64) by daedalus.apache.org with SMTP; 17 Feb 2004 21:03:33 -0000 Received: from apache.org (c-24-5-145-151.client.comcast.net[24.5.145.151]) by comcast.net (sccrmhc13) with ESMTP id <20040217210338016003ebuae>; Tue, 17 Feb 2004 21:03:39 +0000 Message-ID: <403281A7.3040809@apache.org> Date: Tue, 17 Feb 2004 13:03:35 -0800 From: Doug Cutting User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Developers List Subject: Re: Dmitry's Term Vector stuff, plus some References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Grant Ingersoll wrote: > Do you see any reason to write position information at all for the term vectors? It could be useful to some folks. If, for example, you only want to expand a query with terms that occur near query terms, like automatic phrase identification. In general, the vector stuff is just a constant factor improvement over re-tokenizing the text of the document, but hopefully a substantial one. If folks are doing computations which require positional information, but don't require the actual text (e.g., they don't need user-readable fragments) then positions could be handy. But, certainly, most applications for term vectors do not need positions, and I would not be upset if these were left out of the first version. Doug --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org