lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jolinar13 <jolina...@gmail.com>
Subject Re: Very odd behaviour of FrenchAnalyzer with strings in capital letters
Date Mon, 28 May 2007 15:25:28 GMT

Finally, I use the standard analyzer with some custom stop words :
le,la,les,l',un,une,des,d',à,au,de,et,en,dans,se,sont,qui,a,est,il,pour,que,du,sa,par,mais,sur,avec,aux,ce,d,s,l,ou,pas,ses
Thanks anyway
Florian


Jolinar13 wrote:
> 
> It looks like it remove the letter in the end, if it ends with an 'a', 'e'
> or 'i'.
> Femelles => all:femel
> Is this expected?
> How to use FrenchAnalyzer?
> Thanks
> Florian
> 
> 
> Jolinar13 wrote:
>> 
>> Some terms I tested :
>> vehicle => all:vehicl
>> vehiCle => all:vehicle
>> Vehicle => all:vehicl
>> VeHicle => all:vehicle
>> VEHICLE => all:vehicle
>> vehicles => all:vehicl
>> paris => all:par
>> :S
>> 
>> 
>> Jolinar13 wrote:
>>> 
>>> Thanks to Luke, I realized my terms were not parsed correctly, and this
>>> has nothing to do with upper case!
>>> It seems to happen when the word ends with "*i". For example "giovanni"
>>> is parsed "giovann".
>>> Something about this?
>>> Florian
>>> 
>>> 
>>> Jolinar13 wrote:
>>>> 
>>>> Hello Mark!
>>>> Thank you a lot for your answer.
>>>> You are right for the Luke part. My Luke version was too old. My bad.
>>>> But with Luke I still observe the problem I described.
>>>> Any idea how to sort this out?
>>>> Maybe this has to do with the fact I use Compass?
>>>> Thank you
>>>> Florian
>>>> 
>>>>>>>> I got strange
>>>>>>>> search results on strings in uppercase. (example : VEHICLE)
>>>>>>>> When I search the string (in lower case), I get no result.
I get
>>>>>>>> results
>>>>>>>> if
>>>>>>>> I use "vehicle*" or "vehiclE", or "vehicLe" etc.
>>>>>>>>
>>>>>>>> What is odd is that it affects only some of the strings,
not all of
>>>>>>>> them.
>>>> 
>>>> 
>>>> markrmiller wrote:
>>>>> 
>>>>> FrenchAnalyzer does lowercase and using it would not in anyway alter

>>>>> Lukes ability to read your index.
>>>>> 
>>>>> - Mark
>>>>> 
>>>>> Jolinar13 wrote:
>>>>>> Hello Erick,
>>>>>> Still no idea about my problem?
>>>>>> Anybody here using the FrenchAnalyzer?
>>>>>> Thanks,
>>>>>> Florian
>>>>>>
>>>>>>
>>>>>> Jolinar13 wrote:
>>>>>>   
>>>>>>> Hello,
>>>>>>> Thank you for your quick answer.
>>>>>>> I use Luke to examine the index, but since I switched to
>>>>>>> FrenchAnalyzer,
>>>>>>> it says 'Not a Lucene index'.
>>>>>>> If I open the index files in a text viewer, the strings are in
UPPER
>>>>>>> case.
>>>>>>> I do use the same analyzer to index and search.
>>>>>>> So, do I have to specify the FrenchAnalyzer not to be case
>>>>>>> sensitive? How
>>>>>>> to do that?
>>>>>>> Thanks a lot
>>>>>>> Florian
>>>>>>>
>>>>>>>
>>>>>>> Erick Erickson wrote:
>>>>>>>     
>>>>>>>> First have you gotten a copy of Luke to examine your index
to see
>>>>>>>> what's actually indexed?
>>>>>>>>
>>>>>>>> The default behavior is usually to lowercase everything,
but I'm
>>>>>>>> not
>>>>>>>> entirely sure if the French analyzer does this. But I suspect
so.
>>>>>>>>
>>>>>>>> Searches are case sensitive. To get caseless searching, you
need
>>>>>>>> to put everything in the same case. This is usually done
for you
>>>>>>>> with
>>>>>>>> any of the standard analyzers, but check specifically.
>>>>>>>>
>>>>>>>> Are you using the same analyzer at index AND search time?
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Erick
>>>>>>>>
>>>>>>>> On 5/21/07, Jolinar13 <jolinar13@gmail.com> wrote:
>>>>>>>>       
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I tried org.apache.lucene.analysis.fr.FrenchAnalyzer
and I got
>>>>>>>>> strange
>>>>>>>>> search results on strings in uppercase. (example : VEHICLE)
>>>>>>>>> When I search the string (in lower case), I get no result.
I get
>>>>>>>>> results
>>>>>>>>> if
>>>>>>>>> I use "vehicle*" or "vehiclE", or "vehicLe" etc.
>>>>>>>>>
>>>>>>>>> What is odd is that it affects only some of the strings,
not all
>>>>>>>>> of
>>>>>>>>> them.
>>>>>>>>> Anyone who has ever experienced this problem?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Florian
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10715673
>>>>>>>>> Sent from the Lucene - Java Users mailing list archive
at
>>>>>>>>> Nabble.com.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         
>>>>>>>>       
>>>>>>>     
>>>>>>
>>>>>>   
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Very-odd-behaviour-of-FrenchAnalyzer-with-strings-in-capital-letters-tf3789153.html#a10837835
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message