lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Very odd behaviour of FrenchAnalyzer with strings in capital letters
Date Mon, 28 May 2007 16:49:00 GMT
FrenchAnalyzer has a stemmer built in. You are seeing the result of that 
stemmer in action. If you would not like to stem, you should take a look 
at the code for FrenchAnalyzer and copy it to make your own...just 
remove the FrenchStemming filter.

- Mark

Jolinar13 wrote:
> Finally, I use the standard analyzer with some custom stop words :
> le,la,les,l',un,une,des,d',à,au,de,et,en,dans,se,sont,qui,a,est,il,pour,que,du,sa,par,mais,sur,avec,aux,ce,d,s,l,ou,pas,ses
> Thanks anyway
> Florian
>
>
> Jolinar13 wrote:
>   
>> It looks like it remove the letter in the end, if it ends with an 'a', 'e'
>> or 'i'.
>> Femelles => all:femel
>> Is this expected?
>> How to use FrenchAnalyzer?
>> Thanks
>> Florian
>>
>>
>> Jolinar13 wrote:
>>     
>>> Some terms I tested :
>>> vehicle => all:vehicl
>>> vehiCle => all:vehicle
>>> Vehicle => all:vehicl
>>> VeHicle => all:vehicle
>>> VEHICLE => all:vehicle
>>> vehicles => all:vehicl
>>> paris => all:par
>>> :S
>>>
>>>
>>> Jolinar13 wrote:
>>>       
>>>> Thanks to Luke, I realized my terms were not parsed correctly, and this
>>>> has nothing to do with upper case!
>>>> It seems to happen when the word ends with "*i". For example "giovanni"
>>>> is parsed "giovann".
>>>> Something about this?
>>>> Florian
>>>>
>>>>
>>>> Jolinar13 wrote:
>>>>         
>>>>> Hello Mark!
>>>>> Thank you a lot for your answer.
>>>>> You are right for the Luke part. My Luke version was too old. My bad.
>>>>> But with Luke I still observe the problem I described.
>>>>> Any idea how to sort this out?
>>>>> Maybe this has to do with the fact I use Compass?
>>>>> Thank you
>>>>> Florian
>>>>>
>>>>>           
>>>>>>>>> I got strange
>>>>>>>>> search results on strings in uppercase. (example : VEHICLE)
>>>>>>>>> When I search the string (in lower case), I get no result.
I get
>>>>>>>>> results
>>>>>>>>> if
>>>>>>>>> I use "vehicle*" or "vehiclE", or "vehicLe" etc.
>>>>>>>>>
>>>>>>>>> What is odd is that it affects only some of the strings,
not all of
>>>>>>>>> them.
>>>>>>>>>                   
>>>>> markrmiller wrote:
>>>>>           
>>>>>> FrenchAnalyzer does lowercase and using it would not in anyway alter

>>>>>> Lukes ability to read your index.
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>> Jolinar13 wrote:
>>>>>>             
>>>>>>> Hello Erick,
>>>>>>> Still no idea about my problem?
>>>>>>> Anybody here using the FrenchAnalyzer?
>>>>>>> Thanks,
>>>>>>> Florian
>>>>>>>
>>>>>>>
>>>>>>> Jolinar13 wrote:
>>>>>>>   
>>>>>>>               
>>>>>>>> Hello,
>>>>>>>> Thank you for your quick answer.
>>>>>>>> I use Luke to examine the index, but since I switched to
>>>>>>>> FrenchAnalyzer,
>>>>>>>> it says 'Not a Lucene index'.
>>>>>>>> If I open the index files in a text viewer, the strings are
in UPPER
>>>>>>>> case.
>>>>>>>> I do use the same analyzer to index and search.
>>>>>>>> So, do I have to specify the FrenchAnalyzer not to be case
>>>>>>>> sensitive? How
>>>>>>>> to do that?
>>>>>>>> Thanks a lot
>>>>>>>> Florian
>>>>>>>>
>>>>>>>>
>>>>>>>> Erick Erickson wrote:
>>>>>>>>     
>>>>>>>>                 
>>>>>>>>> First have you gotten a copy of Luke to examine your
index to see
>>>>>>>>> what's actually indexed?
>>>>>>>>>
>>>>>>>>> The default behavior is usually to lowercase everything,
but I'm
>>>>>>>>> not
>>>>>>>>> entirely sure if the French analyzer does this. But I
suspect so.
>>>>>>>>>
>>>>>>>>> Searches are case sensitive. To get caseless searching,
you need
>>>>>>>>> to put everything in the same case. This is usually done
for you
>>>>>>>>> with
>>>>>>>>> any of the standard analyzers, but check specifically.
>>>>>>>>>
>>>>>>>>> Are you using the same analyzer at index AND search time?
>>>>>>>>>
>>>>>>>>> Best
>>>>>>>>> Erick
>>>>>>>>>
>>>>>>>>> On 5/21/07, Jolinar13 <jolinar13@gmail.com> wrote:
>>>>>>>>>       
>>>>>>>>>                   
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I tried org.apache.lucene.analysis.fr.FrenchAnalyzer
and I got
>>>>>>>>>> strange
>>>>>>>>>> search results on strings in uppercase. (example
: VEHICLE)
>>>>>>>>>> When I search the string (in lower case), I get no
result. I get
>>>>>>>>>> results
>>>>>>>>>> if
>>>>>>>>>> I use "vehicle*" or "vehiclE", or "vehicLe" etc.
>>>>>>>>>>
>>>>>>>>>> What is odd is that it affects only some of the strings,
not all
>>>>>>>>>> of
>>>>>>>>>> them.
>>>>>>>>>> Anyone who has ever experienced this problem?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Florian
>>>>>>>>>> --
>>>>>>>>>> View this message in context:
>>>>>>>>>> http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
>>>>>>>>>> Sent from the Lucene - Java Users mailing list archive
at
>>>>>>>>>> Nabble.com.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>         
>>>>>>>>>>                     
>>>>>>>>>       
>>>>>>>>>                   
>>>>>>>>     
>>>>>>>>                 
>>>>>>>   
>>>>>>>               
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>           
>>>>         
>>>       
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message