Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 18460 invoked from network); 30 Nov 2004 19:59:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 30 Nov 2004 19:59:29 -0000 Received: (qmail 93600 invoked by uid 500); 30 Nov 2004 19:59:25 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 93568 invoked by uid 500); 30 Nov 2004 19:59:25 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 93550 invoked by uid 99); 30 Nov 2004 19:59:25 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from fork2.mail.Virginia.EDU (HELO fork2.mail.virginia.edu) (128.143.2.192) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 30 Nov 2004 11:59:22 -0800 Received: from localhost (localhost [127.0.0.1]) by fork2.mail.virginia.edu (Postfix) with ESMTP id E48851C021 for ; Tue, 30 Nov 2004 14:59:17 -0500 (EST) Received: from fork2.mail.virginia.edu ([127.0.0.1]) by localhost (fork2.mail.virginia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 25794-05 for ; Tue, 30 Nov 2004 14:59:17 -0500 (EST) Received: from [128.143.167.108] (d-128-167-108.bootp.Virginia.EDU [128.143.167.108]) by fork2.mail.virginia.edu (Postfix) with ESMTP id 6C2CE1BFA0 for ; Tue, 30 Nov 2004 14:59:16 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v619) In-Reply-To: <41ACCA34.1050107@alunos.ipca.pt> References: <41AC94E5.6060302@alunos.ipca.pt> <41ACCA34.1050107@alunos.ipca.pt> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <57079124-430A-11D9-B17D-000A95BC61B6@ehatchersolutions.com> Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: Does QueryParser uses Analyzer ? Date: Tue, 30 Nov 2004 14:59:27 -0500 To: "Lucene Developers List" X-Mailer: Apple Mail (2.619) X-UVA-Virus-Scanned: by amavisd-new at fork2.mail.virginia.edu X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Nov 30, 2004, at 2:29 PM, Ricardo Lopes wrote: > > My guess is that your analyzer is what did the splitting > > After looker with more attetion to the code i found that the > tokenStream method in the BrazilianAnalyzer calls the > StandardTokenizer and is this the one that split the search string, is > there a simple way of subclass the tokenizer to avoid splitting those > characters or do i have make a custom implementation of that class. You can verify this by using the AnalysisDemo referenced here: http://wiki.apache.org/jakarta-lucene/AnalysisParalysis Or use Luke - http://www.getopt.org/luke/ - which has a nice plugin page that can do this type of analysis inspection (you'll have to add the sandbox analyzer JAR to the classpath when launching Luke). As for subclassing StandardTokenizer - no, you won't have much luck there. StandardTokenizer is a JavaCC-based tokenizer and is not designed for subclassing to control this sort of thing. > As this only happends when i make a search (during indexing the > splitting of those characters doesn't happend) Are you sure that splitting is not happening during indexing? If the AnalysisDemo (or Luke) run on your string splits then it is splitting at indexing time too. Keep in mind that looking at a field's value is showing you the stored *original* value, not the tokenized values. > i thought that i had to do with the QueryParser, but it seems that > the problem is with the StandardTokenizer. I'm not sure - I haven't tried that string with the analyzer you provided. If it was with StandardTokenizer and you're using the same analyzer for indexing and searching, you'd have the values split in both places - which is actually fine as searches would match what was indexed :) Erik --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org