lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheolgoo Kang <app...@gmail.com>
Subject Re: Problems with special characters
Date Fri, 02 Jul 2004 14:57:40 GMT
How about creating a special-char-converting-reader like this?

public class LuceneReader extends Reader {
 private Reader source = null;
 private char buffer = (char) 0;
 public LuceneReader( Reader sourceReader ) {
  this.source = sourceReader;
 }
 public int read() {
  char result = (char) 0;
  if ( buffer != (char) 0 ) {
   result = buffer;
   buffer = (char) 0;
   return result;
  }
  result = (char) source.read();
  if ( isSpecialCharacter( result ) ) {
   buffer = result;
   return '\\';
  }
  return result;
 }
 private boolean isSpecialCharacter( char c ) {
  return ( c == '+' /* all special characters */ );
 }
}

The LuceneReader.read() above checks for the char to be returned.
if it's one of those special characters, it buffers the char and return '\'.

I've just wrote it instantly and of course not a complete one but can
be your starting point.

Cheolgoo


On Fri, 2 Jul 2004 12:44:48 +0200, Marten Senkel
<msenkel@europe.sial.com> wrote:
> 
> 
> I had a similar problem.
> I don't know whether there is a more intelligent solution, but the quickest I had in
mind was to
> convert the special characters I needed to look up into a fixed random character string.
For
> example: prior to indexing I replace all occurences of '+' by 'PLUSsdfaEGsgfAE'.
> 
> When searching I intercept the terms the user entered, replace '+' by the same random
character
> string and search for it instead of the original special character.
> This works, of course, only if one constructs the query by oneself giving the user only
some basic
> checkbox options to specify 'AND' or 'OR' queries for example.
> 
> If you use sth like this users wouldn't be able to write themselves 'advanced' searches
like +foo
> +bar as the command sign '+' would be converted as well.
> A fix for that problem could be to convert 'C+' to a random string and replace only 'C+'
by the
> random string when searching ... this would leave the command '+' intact.
> 
> It's a very basic and quick & dirty solution, I know, but it worked well for me.
> 
> Marten
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message