Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 47905 invoked from network); 23 Jan 2003 18:57:47 -0000 Received: from exchange.sun.com (192.18.33.10) by 208.185.179.12.available.above.net with SMTP; 23 Jan 2003 18:57:47 -0000 Received: (qmail 10610 invoked by uid 97); 23 Jan 2003 18:58:43 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 10556 invoked by uid 97); 23 Jan 2003 18:58:42 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 10534 invoked by uid 98); 23 Jan 2003 18:58:41 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) Date: Thu, 23 Jan 2003 19:56:38 +0100 (CET) From: Leo Galambos To: Lucene Developers List Subject: Re: Automatic stop-words In-Reply-To: <3E2F0D42.70602@lucene.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: 208.185.179.12.available.above.net 1.6.2 0/1000/N X-Spam-Rating: 208.185.179.12.available.above.net 1.6.2 0/1000/N > >>>When I want to search "Linux", nothing is found. > >>>This word is in every article in the content. > >>>Or is something wrong? > >>Yes :) > > why? log(1)=0. it is OK, I think :-))) so where's any problem? > Thus a term which occurs in every document gets a value of 1.0, not zero. I then believe in UFO :-) So does he have long documents? Can it then fall to 0, when you normalize the vector (tf_linux=1 |w|=20000words)? -g- -- To unsubscribe, e-mail: For additional commands, e-mail: