Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Message-ID: <001501c19078$e10745b0$024a1390@trollw2kserver>
Reply-To: "Brian Brown" <brian.brown@wanadoo.fr>
From: "Brian Brown" <brian.brown@wanadoo.fr>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
References: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7D74@mail.grandcentral.com>
 <20011211034401.A14573@lx.quiotix.com>
Subject: Re: searching words starting with accent characters using UTF-8
Date: Sat, 29 Dec 2001 15:55:38 +0100
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 8bit

I am developing a French search engine using Lucene. To this end I used
Gerhard Schwarz's
German analyser as a starting point. This seems to work ok, the main
difference that I am
using a lookup table approach rather than stemming by calculation.

I find it is also necessary to adapt the QueryParser for accented
characters. My approach is
simply to add �,�,�,�,... into the the #_TERM_CHAR and #_TERM_START_CHAR
character
sets. My question is: what is the purpose of adding in all the characters:
"\u0080"-"\uFFFE" which
I find in the current source?

Brian Brown
----- Original Message -----
From: "Brian Goetz" <brian@quiotix.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Tuesday, December 11, 2001 12:44 PM
Subject: Re: searching words starting with accent characters using UTF-8


> > Thanks!  That would be great!
>
> Be careful what you ask for, I foobared it up the last time... :)
>
> > Yes, this is a lot of features, and a lot of syntax.  The query parser
is
> > already complicated.  Perhaps we should instead write a number of
example
> > query parsers that do different things, and encourage folks to write
their
> > own, with these as models.  Unfortunately, I'm not sure many folks would
do
> > that: instead they would ask why one parser doesn't have a feature that
> > another does.  So I'm having a hard time seeing a non-kitchen-sink
> > alternative.  Do you?
>
> I don't really object to a kitchen sink approach, but I prefer to have
> it done all at once rather than added incrementally.
>
> So far we have:
>  - Prefix (currently *)
>  - Fuzzy ( currently ~)
>  - Boost (currently ^nn)
>  - AND, OR, NOT, &&, ||, !
>  - Phrases ("foo bar")
>
> We want to add:
>  - NEAR/phrase-with-slop
>
>
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>