Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 72254 invoked from network); 16 Dec 2003 11:52:08 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 16 Dec 2003 11:52:08 -0000 Received: (qmail 97810 invoked by uid 500); 16 Dec 2003 11:52:02 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 97773 invoked by uid 500); 16 Dec 2003 11:52:02 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 97752 invoked from network); 16 Dec 2003 11:52:01 -0000 Received: from unknown (HELO wgateout.microfocus.com) (193.129.81.248) by daedalus.apache.org with SMTP; 16 Dec 2003 11:52:01 -0000 Received: from wgateout.microfocus.com (wgateout.microfocus.com [10.120.11.247]) by wgateout.microfocus.com (Build 101 8.9.3/NT-8.9.3) with SMTP id LAA15973 for ; Tue, 16 Dec 2003 11:51:56 GMT Received: from nwb-corpmail.microfocus.com ([10.120.11.221]) by wgateout.microfocus.com (SAVSMTP 3.1.0.29) with SMTP id M2003121611515613031 for ; Tue, 16 Dec 2003 11:51:56 GMT Received: by nwb-corpmail.microfocus.com with Internet Mail Service (5.5.2653.19) id ; Tue, 16 Dec 2003 11:46:48 -0000 Message-ID: <5DC9DA3ED5A9B64086C7612AAEE3487C03201A2F@nwb-corpmail.microfocus.com> From: Iain Young To: "'Lucene Users List'" Subject: RE: Disabling modifiers? Date: Tue, 16 Dec 2003 11:46:43 -0000 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I think it is a problem with the indexing. I've found another example... WS-CA-PP00-PROCESS-YYMM I've looked at the index, and it has been tokenized into 3 words... WS CA-PP00-PROCESS YYMM Looks as though I might have to use a custom tokenizer as well as an analyzer then, but any ideas as to why the standard tokenizer would have split the variable up like this (i.e. why didn't it split the middle bit, only the word off either end)? The only thing I can think of is that there are several other variables in the source beginning with WS- or ending with -YYMM, so could the tokenizer have seen this and be doing something clever with them? Thanks, Iain ***************************************** * Micro Focus Developer Forum 2004 * * 3 days that will make a difference * * www.microfocus.com/devforum * ***************************************** --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org