From lucene-user-return-6493-apmail-jakarta-lucene-user-archive=jakarta.apache.org@jakarta.apache.org Thu Dec 11 21:02:48 2003 Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 72929 invoked from network); 11 Dec 2003 21:02:47 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 11 Dec 2003 21:02:47 -0000 Received: (qmail 72048 invoked by uid 500); 11 Dec 2003 21:02:19 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 71916 invoked by uid 500); 11 Dec 2003 21:02:18 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 71821 invoked from network); 11 Dec 2003 21:02:17 -0000 Received: from unknown (HELO smtp-in.rrz.uni-koeln.de) (134.95.19.47) by daedalus.apache.org with SMTP; 11 Dec 2003 21:02:17 -0000 Received: from smail.uni-koeln.de (xdsl-195-14-206-142.netcologne.de [195.14.206.142]) (authenticated as user kraemert using CRAM-MD5 bits=0) by cyrus.rrz.uni-koeln.de (8.12.10/8.12.10) with ESMTP id hBBL2Ih5013858 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) for ; Thu, 11 Dec 2003 22:02:20 +0100 Message-ID: <3FD8DB08.6020302@smail.uni-koeln.de> Date: Thu, 11 Dec 2003 22:00:56 +0100 From: =?ISO-8859-1?Q?Thomas_Kr=E4mer?= User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.5) Gecko/20030925 X-Accept-Language: de, en MIME-Version: 1.0 To: Lucene Users List Subject: build a case insensitive index Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hello Lucene Users i need a document term matrix to initialize a neural network, that i want to use to integrate user feedback in the retrieval process. until now, i am using a slightly modified class of the IndexHTML example. how can i create an index of all the terms in a collection without "term" and "Term" being indexed twice? in the example, a standard analyzer is used, and in the documentation it sais : Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter. So, why do i get double entries for terms in upper- and lower case writing? Regards. Thomas --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org