lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Goetz <br...@quiotix.com>
Subject Re: searching words starting with accent characters using UTF-8
Date Mon, 10 Dec 2001 08:49:18 GMT
OK, mea culpa.  The new query parser is not as "improved" as I'd hoped
it would be.  My bad.

In my defense, I'd say that I've written lots of parsers, but always
for development tools, which have different error-handling
characteristics than end-user tools, which the query parser is.
But that's not really relevant.  

When I wrote the query parser, I defined query terms by inclusion
rather than exclusion, and the result was that a lot of valid queries
are not accepted.  It shouldn't be hard to fix this problem once and
for all, and I'm happy to do so, except that there are a few things
about it I want to discuss first.

The query parser was later extended to support fuzzy and wildcard
queries.  A few people here, including Doug and myself, questioned the
wisdom of putting chainsaws in the hands of the untrained; innocent,
inadvertent misuse of these features effectively constitute a
denial-of- service attack.  I'm of the opinion that these advanced
features should be left to the programmatic query classes, or perhaps
to a separate "power-user query parser", and not part of the basic
query language.  This discussion was started about a month and a half
ago, but never came to any agreement or even any real discussion.  

I'm perfectly willing to redo it, but I'd like to see a decision on
the above first.  (You all know my opinion; I'm willing to hear
reasoned arguments on the other side, but the "backward compatibility"
argument, which was raised earlier, will be ignored -- we're allowed
to decide that a recent feature is a bad idea and kill it.)

-Brian


On Mon, Dec 10, 2001 at 09:38:51AM +0100, Karl Øie wrote:
> please note that the code in rc1 and rc2 is not working with Norwegian
> chars(don't know about accents), I had to use the version 1.6 from the cvs
> repository.
> 
> files can be found here, after downlaoding you will have to rebuild lucene.
> 
> http://cvs.apache.org/viewcvs/jakarta-lucene/src/java/org/apache/lucene/quer
> yParser/
> 
> mvh karl øie
> 
> 
> -----Original Message-----
> From: Kiran Kumar K.G [mailto:kiran.k@net-kraft.com]
> Sent: 8. desember 2001 12:43
> To: lucene-user@jakarta.apache.org
> Subject: searching words starting with accent characters using UTF-8
> 
> 
> 
> Iam trying to search for words starting with accent characters.As posted in
> one of the Lucene user-list mail I tried using UTF-8 encoding
> String query = this.request.getParameter( "query" );
> if( query!=null ) {
> query = new String( query.getBytes(), "UTF-8" );
> }
> 
> But this code is not working properly.
> If I give a word æran, its giving empty string
> 
> Somebody in the user list had mentioned that this is working fine.
> 
> Please mention any dependency for this code to work(OS,JDK version etc.)
> 
> Let me know is there any other way to handle words starting with special
> characters.
> 
> Thanx,
> Kiran
> 
> 
> 
> 
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message