Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 69148 invoked from network); 28 May 2007 16:50:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 May 2007 16:50:04 -0000 Received: (qmail 26523 invoked by uid 500); 28 May 2007 16:50:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 26180 invoked by uid 500); 28 May 2007 16:50:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 26169 invoked by uid 99); 28 May 2007 16:50:01 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 May 2007 09:50:01 -0700 X-ASF-Spam-Status: No, hits=0.3 required=10.0 tests=MAILTO_TO_SPAM_ADDR,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of markrmiller@gmail.com designates 64.233.184.237 as permitted sender) Received: from [64.233.184.237] (HELO wr-out-0506.google.com) (64.233.184.237) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 May 2007 09:49:56 -0700 Received: by wr-out-0506.google.com with SMTP id 36so583379wra for ; Mon, 28 May 2007 09:49:35 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=dP0Y8DztIaHjvbEPdi68tDhfBQJIlcH7JxU3OEqgEc/qGh8utYY7yiKQKh1SIHLbe8pkn4za5ROg4koKOipZ39cgxEPxQdSJsAPOQTJVsnWOh3wVYUbGePyiz6tako8dGgPP8mcC5wnSicZs1OKCUCEhfpAFipxObNaErRRR3YE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=hXhlVn+EF4pVpKnBlyfYKcmpti0pFxKfA6BZ622NLUnLWrUi+YFCr3rGy9Z8OxRGfxivEnIypTSt1lI1HR/wA22/Py3i1ZZg+JQ5kynA0gQuZUsExIQiRwbeoE91w6qgc9r1Z8XN7E0bqGr7XQiwLf3My/ta0CHTV5lkj21//14= Received: by 10.90.113.18 with SMTP id l18mr4014335agc.1180370975550; Mon, 28 May 2007 09:49:35 -0700 (PDT) Received: from ?192.168.1.100? ( [67.86.221.9]) by mx.google.com with ESMTP id 25sm6055895wra.2007.05.28.09.49.35; Mon, 28 May 2007 09:49:35 -0700 (PDT) Message-ID: <465B07FC.4050001@gmail.com> Date: Mon, 28 May 2007 12:49:00 -0400 From: Mark Miller User-Agent: Thunderbird 2.0.0.0 (Windows/20070326) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Very odd behaviour of FrenchAnalyzer with strings in capital letters References: <10715673.post@talk.nabble.com> <359a92830705210627j6b3cca0ci5c53e5ff6d3ffea2@mail.gmail.com> <10719413.post@talk.nabble.com> <10835636.post@talk.nabble.com> <465AD47C.3080602@gmail.com> <10836580.post@talk.nabble.com> <10836694.post@talk.nabble.com> <10836893.post@talk.nabble.com> <10837045.post@talk.nabble.com> <10837835.post@talk.nabble.com> In-Reply-To: <10837835.post@talk.nabble.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org FrenchAnalyzer has a stemmer built in. You are seeing the result of that stemmer in action. If you would not like to stem, you should take a look at the code for FrenchAnalyzer and copy it to make your own...just remove the FrenchStemming filter. - Mark Jolinar13 wrote: > Finally, I use the standard analyzer with some custom stop words : > le,la,les,l',un,une,des,d',à,au,de,et,en,dans,se,sont,qui,a,est,il,pour,que,du,sa,par,mais,sur,avec,aux,ce,d,s,l,ou,pas,ses > Thanks anyway > Florian > > > Jolinar13 wrote: > >> It looks like it remove the letter in the end, if it ends with an 'a', 'e' >> or 'i'. >> Femelles => all:femel >> Is this expected? >> How to use FrenchAnalyzer? >> Thanks >> Florian >> >> >> Jolinar13 wrote: >> >>> Some terms I tested : >>> vehicle => all:vehicl >>> vehiCle => all:vehicle >>> Vehicle => all:vehicl >>> VeHicle => all:vehicle >>> VEHICLE => all:vehicle >>> vehicles => all:vehicl >>> paris => all:par >>> :S >>> >>> >>> Jolinar13 wrote: >>> >>>> Thanks to Luke, I realized my terms were not parsed correctly, and this >>>> has nothing to do with upper case! >>>> It seems to happen when the word ends with "*i". For example "giovanni" >>>> is parsed "giovann". >>>> Something about this? >>>> Florian >>>> >>>> >>>> Jolinar13 wrote: >>>> >>>>> Hello Mark! >>>>> Thank you a lot for your answer. >>>>> You are right for the Luke part. My Luke version was too old. My bad. >>>>> But with Luke I still observe the problem I described. >>>>> Any idea how to sort this out? >>>>> Maybe this has to do with the fact I use Compass? >>>>> Thank you >>>>> Florian >>>>> >>>>> >>>>>>>>> I got strange >>>>>>>>> search results on strings in uppercase. (example : VEHICLE) >>>>>>>>> When I search the string (in lower case), I get no result. I get >>>>>>>>> results >>>>>>>>> if >>>>>>>>> I use "vehicle*" or "vehiclE", or "vehicLe" etc. >>>>>>>>> >>>>>>>>> What is odd is that it affects only some of the strings, not all of >>>>>>>>> them. >>>>>>>>> >>>>> markrmiller wrote: >>>>> >>>>>> FrenchAnalyzer does lowercase and using it would not in anyway alter >>>>>> Lukes ability to read your index. >>>>>> >>>>>> - Mark >>>>>> >>>>>> Jolinar13 wrote: >>>>>> >>>>>>> Hello Erick, >>>>>>> Still no idea about my problem? >>>>>>> Anybody here using the FrenchAnalyzer? >>>>>>> Thanks, >>>>>>> Florian >>>>>>> >>>>>>> >>>>>>> Jolinar13 wrote: >>>>>>> >>>>>>> >>>>>>>> Hello, >>>>>>>> Thank you for your quick answer. >>>>>>>> I use Luke to examine the index, but since I switched to >>>>>>>> FrenchAnalyzer, >>>>>>>> it says 'Not a Lucene index'. >>>>>>>> If I open the index files in a text viewer, the strings are in UPPER >>>>>>>> case. >>>>>>>> I do use the same analyzer to index and search. >>>>>>>> So, do I have to specify the FrenchAnalyzer not to be case >>>>>>>> sensitive? How >>>>>>>> to do that? >>>>>>>> Thanks a lot >>>>>>>> Florian >>>>>>>> >>>>>>>> >>>>>>>> Erick Erickson wrote: >>>>>>>> >>>>>>>> >>>>>>>>> First have you gotten a copy of Luke to examine your index to see >>>>>>>>> what's actually indexed? >>>>>>>>> >>>>>>>>> The default behavior is usually to lowercase everything, but I'm >>>>>>>>> not >>>>>>>>> entirely sure if the French analyzer does this. But I suspect so. >>>>>>>>> >>>>>>>>> Searches are case sensitive. To get caseless searching, you need >>>>>>>>> to put everything in the same case. This is usually done for you >>>>>>>>> with >>>>>>>>> any of the standard analyzers, but check specifically. >>>>>>>>> >>>>>>>>> Are you using the same analyzer at index AND search time? >>>>>>>>> >>>>>>>>> Best >>>>>>>>> Erick >>>>>>>>> >>>>>>>>> On 5/21/07, Jolinar13 wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got >>>>>>>>>> strange >>>>>>>>>> search results on strings in uppercase. (example : VEHICLE) >>>>>>>>>> When I search the string (in lower case), I get no result. I get >>>>>>>>>> results >>>>>>>>>> if >>>>>>>>>> I use "vehicle*" or "vehiclE", or "vehicLe" etc. >>>>>>>>>> >>>>>>>>>> What is odd is that it affects only some of the strings, not all >>>>>>>>>> of >>>>>>>>>> them. >>>>>>>>>> Anyone who has ever experienced this problem? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Florian >>>>>>>>>> -- >>>>>>>>>> View this message in context: >>>>>>>>>> http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673 >>>>>>>>>> Sent from the Lucene - Java Users mailing list archive at >>>>>>>>>> Nabble.com. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org