Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 92176 invoked from network); 5 Jul 2004 16:34:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 5 Jul 2004 16:34:57 -0000 Received: (qmail 18423 invoked by uid 500); 5 Jul 2004 16:34:58 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 18390 invoked by uid 500); 5 Jul 2004 16:34:58 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 18375 invoked by uid 99); 5 Jul 2004 16:34:57 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [69.44.16.11] (HELO getopt.org) (69.44.16.11) by apache.org (qpsmtpd/0.27.1) with ESMTP; Mon, 05 Jul 2004 09:34:56 -0700 Received: from [192.168.0.254] (75-mo3-2.acn.waw.pl [62.121.105.75]) (authenticated) by getopt.org (8.11.6/8.11.6) with ESMTP id i65GYk429789 for ; Mon, 5 Jul 2004 11:34:46 -0500 Message-ID: <40E9832D.3050403@getopt.org> Date: Mon, 05 Jul 2004 18:34:53 +0200 From: Andrzej Bialecki User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040608 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Underscore character and case issue References: <40E98026.9020805@redbaritone.com> In-Reply-To: <40E98026.9020805@redbaritone.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Robert Brown wrote: > I traverse a series of files under a parent directory (similar to the > demo sample) and store the filename in a Document Keyword field called > 'Filename'. I am using the StandardAnalyzer for both building the index > and searching the index. ... and here lies your problem. StandardAnalyzer lowercases the tokens, and strips most of the non-letters from tokens. I suggest using Luke (http://www.getopt.org/luke) to look inside your index, and see the terms as they ended up in the index, and to try out some other analyzers to see which is the most appropriate in your case. -- Best regards, Andrzej Bialecki ------------------------------------------------- Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator ------------------------------------------------- FreeBSD developer (http://www.freebsd.org) --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org