lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Gkouta <vgko...@csd.auth.gr>
Subject Re: Analyzer enquiry
Date Mon, 14 Mar 2011 13:35:40 GMT
Sorry for the confusion. I have two analyzers(of StandardAnalyzer) and  
use no stemmers. At the one analyzer I passed a german stop words set  
to the constructor and at the other one I passed an english stop words  
set. My question was if I have to call any other function of the  
german analyzer for it to be corrent.

Thank you.


Quoting Erick Erickson <erickerickson@gmail.com>:

> I don't understand what you're saying here. If you put a stemmer in the
> constructor, you *are* using it. If you don't specify any stemmer at all, you
> still have to define different analyzers to use different stop word lists.
>
> Can you restate your question?
>
> Best
> Erick
>
> On Mon, Mar 14, 2011 at 8:21 AM, Vasiliki Gkouta <vgkouta@csd.auth.gr> wrote:
>> Thanks a lot for your help Erick! About the fields you mentioned: If I don't
>> use stemmers, except for the constructor argument related to the stop words,
>> is there anything else that I have to modify?
>>
>> Thanks,
>> Vicky
>>
>>
>> Quoting Erick Erickson <erickerickson@gmail.com>:
>>
>>> StandardAnalyzer works well for most European languages. The problem will
>>> be stemming. Applying stemming via English rules to non-English languages
>>> produces...er...interesting results.
>>>
>>> You can go ahead and create language-specific fields for each language and
>>> use StandardAnalyzer with the appropriate stopwords and stemming with
>>> each,
>>> this is a common approach.. The Snowball stemmer takes a language
>>> parameter...
>>>
>>> You need to use specific analyzers for Chinese Japanese Korean (CJK)
>>> documents
>>> though.
>>>
>>> Hope that helps
>>> Erick
>>>
>>> On Sun, Mar 13, 2011 at 7:23 PM, Vasiliki Gkouta <vgkouta@csd.auth.gr>
>>> wrote:
>>>>
>>>> Hello everybody,
>>>>
>>>> I have an enquiry about StandardAnalyzer. Can I use it for other
>>>> languages
>>>> except from English? I give the right list of stop words at
>>>> initialization.
>>>> Is there anything else inside the class that is by default set in
>>>> English?
>>>> I've found the Analyzers for other languages too but they where seem to
>>>> be
>>>> deprecated.. Moreover I use english and other languages, all together in
>>>> my
>>>> project so I would like to ask if there is a way to use either the same
>>>> class analyzer for all of them, or analyzers of the same functionality
>>>> for
>>>> all the languages. Thanks in advance!
>>>>
>>>> Best regards,
>>>> Vicky
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message