Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 17543 invoked from network); 2 Jul 2004 14:58:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 2 Jul 2004 14:58:25 -0000 Received: (qmail 11736 invoked by uid 500); 2 Jul 2004 14:57:56 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 11635 invoked by uid 500); 2 Jul 2004 14:57:54 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 11554 invoked by uid 99); 2 Jul 2004 14:57:52 -0000 X-ASF-Spam-Status: No, hits=1.3 required=10.0 tests=RCVD_BY_IP,SB_NEW_BULK,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received: from [64.233.170.194] (HELO mproxy.gmail.com) (64.233.170.194) by apache.org (qpsmtpd/0.27.1) with SMTP; Fri, 02 Jul 2004 07:57:51 -0700 Received: by mproxy.gmail.com with SMTP id 55so471155rni for ; Fri, 02 Jul 2004 07:57:40 -0700 (PDT) Received: by 10.38.71.2 with SMTP id t2mr49551rna; Fri, 02 Jul 2004 07:57:40 -0700 (PDT) Message-ID: <48b7084904070207573c9f9f22@mail.gmail.com> Date: Fri, 2 Jul 2004 23:57:40 +0900 From: Cheolgoo Kang To: Lucene Users List Subject: Re: Problems with special characters In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N How about creating a special-char-converting-reader like this? public class LuceneReader extends Reader { private Reader source = null; private char buffer = (char) 0; public LuceneReader( Reader sourceReader ) { this.source = sourceReader; } public int read() { char result = (char) 0; if ( buffer != (char) 0 ) { result = buffer; buffer = (char) 0; return result; } result = (char) source.read(); if ( isSpecialCharacter( result ) ) { buffer = result; return '\\'; } return result; } private boolean isSpecialCharacter( char c ) { return ( c == '+' /* all special characters */ ); } } The LuceneReader.read() above checks for the char to be returned. if it's one of those special characters, it buffers the char and return '\'. I've just wrote it instantly and of course not a complete one but can be your starting point. Cheolgoo On Fri, 2 Jul 2004 12:44:48 +0200, Marten Senkel wrote: > > > I had a similar problem. > I don't know whether there is a more intelligent solution, but the quickest I had in mind was to > convert the special characters I needed to look up into a fixed random character string. For > example: prior to indexing I replace all occurences of '+' by 'PLUSsdfaEGsgfAE'. > > When searching I intercept the terms the user entered, replace '+' by the same random character > string and search for it instead of the original special character. > This works, of course, only if one constructs the query by oneself giving the user only some basic > checkbox options to specify 'AND' or 'OR' queries for example. > > If you use sth like this users wouldn't be able to write themselves 'advanced' searches like +foo > +bar as the command sign '+' would be converted as well. > A fix for that problem could be to convert 'C+' to a random string and replace only 'C+' by the > random string when searching ... this would leave the command '+' intact. > > It's a very basic and quick & dirty solution, I know, but it worked well for me. > > Marten > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org