lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: Lucene roadmap for language analyzers
Date Mon, 22 Feb 2016 09:14:43 GMT
Hi,

Moving discussion to Lucene user list.

You may want to look at these references:
* http://lucene.472066.n3.nabble.com/JLemmaGen-project-td4097466.html
* https://github.com/Amice13/ukr_stemmer

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. feb. 2016 kl. 17.58 skrev Jack Krupansky <jack.krupansky@gmail.com>:
> 
> Oops... just noticed - this discussion should be moved to the user mailing
> list, not the "general" list. Sorry for not noticing earlier.
> 
> -- Jack Krupansky
> 
> On Thu, Feb 18, 2016 at 11:57 AM, Jack Krupansky <jack.krupansky@gmail.com>
> wrote:
> 
>> Wow, Ukraine language sure is a challenge due to all of the political and
>> cultural forces over the centuries. See:
>> https://en.wikipedia.org/wiki/Ukrainian_language
>> 
>> So, the first question is what is the central focus of an interest in the
>> Ukraine language - more focused on contemporary media (newspapers,
>> magazines, government documents), literature, and social media (blog posts,
>> tweets) in Kiev, or more on historic literature/books and official
>> documents in the 20th Century? Or... what?
>> 
>> Which dialect(s) are your central focus? e.g., Middle Dnieprian ("the
>> basis of the Standard Literary Ukrainian")?
>> 
>> Any examples to give for technical issues such as stemming, punctuation,
>> word boundaries, compound words, stop words? Which modern language is
>> Ukrainian most similar to... Russian? How similar, how dissimilar?
>> 
>> 
>> -- Jack Krupansky
>> 
>> On Thu, Feb 18, 2016 at 11:41 AM, Upayavira <uv@odoko.co.uk> wrote:
>> 
>>> Nurul,
>>> 
>>> You can search through JIRA [1] for Lucene issues regarding Ukrainian. I
>>> didn't find anything to suggest anyone is working on it.
>>> 
>>> What do you need Lucene to do that it currently doesn't? You may well be
>>> able to get away with using another language, or a more generic,
>>> non-language specific analysis for such languages.
>>> 
>>> As to who to pay - there's no specific set of people - anyone who both
>>> understands Lucene's internals, and understands (or can be helped to
>>> understand) the needs of the Ukrainian language should be able to do the
>>> work.
>>> 
>>> Upayavira
>>> 
>>> On Thu, Feb 18, 2016, at 03:55 PM, Nurul AMIN wrote:
>>>> Hello Upayavira,
>>>> 
>>>> Thanks for your email.
>>>> 
>>>> In that case, can I know, if Lucene team is already working on
>>>> "Ukrainian". If I need to pay, do you know how much is the cost and whom
>>>> should I contact?
>>>> 
>>>> Many thanks!
>>>> 
>>>> Best regards,
>>>> 
>>>> Nurul Amin
>>>> Manager, Software Development, Service Technology Group (STG)
>>>> Amadeus Customer Service (ACS)
>>>> Amadeus s.a.s.
>>>> France
>>>> T: +33 4 97 23 03 82
>>>> 
>>>> Done is better than perfect!
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Upayavira [mailto:uv@odoko.co.uk]
>>>> Sent: 18 February 2016 10:41
>>>> To: general@lucene.apache.org
>>>> Subject: Re: Lucene roadmap for language analyzers
>>>> 
>>>> Nurul,
>>>> 
>>>> Given the community based, meritocratic nature of the Lucene community,
>>>> there is no 'roadmap' as such. Features are added when people need them
>>>> and can justify developing them.
>>>> 
>>>> The features you are requesting, if not present already, will be added
>>>> when someone needs them sufficiently to implement them, or to pay
>>>> someone to implement them.
>>>> 
>>>> Upayavira
>>>> 
>>>> 
>>>> On Wed, Feb 17, 2016, at 11:05 PM, Nurul AMIN wrote:
>>>>> Hello,
>>>>>> 
>>>> 
>>>> 
>>>>>> 
>>>> 
>>>> 
>>>>>> I do not find Lucene roadmap for language implementation. In fact,
I
>>>>>> am interested on the following languages
>>>> 
>>>> 
>>>>>> -Ukrainian
>>>> 
>>>> 
>>>>>> -Hebrew
>>>> 
>>>> 
>>>>>> -Bahasa.
>>>> 
>>>> 
>>>>>> 
>>>> 
>>>> 
>>>>>> Seems Lucene does not have those languages today
>>>>>> (
>>> https://lucene.apache.org/core/5_4_1/analyzers-common/overview-summary.html
>>> )
>>>> 
>>>> 
>>>>>> 
>>>> 
>>>> 
>>>>>> Do you know, if future versions of Lucene will bring those languages?
>>>> 
>>>> 
>>>>>> 
>>>> 
>>>> 
>>>>>> Many thanks in advance for your help.
>>>> 
>>>> 
>>>>>> 
>>>>>> 
>>>> Best regards,
>>>> 
>>>> 
>>>>>> 
>>>> 
>>>> 
>>>>>> *Nurul Amin**
>>>>>> 
>>>> Manager, Software Development, Service Technology Group (STG) *
>>>>>> 
>>>> Amadeus Customer Service (ACS)
>>>>>> 
>>>> Amadeus s.a.s.
>>>>>> 
>>>> France
>>>>>> 
>>>> T: +33 4 97 23 03 82
>>>> 
>>>>>> 
>>>>>> Done is better than perfect!
>>>>>> 
>>>>>> __
>>>> 
>>>>>> behaviour-static-banner__
>>>> 
>>>> 
>>>>>> _  _
>>>> 
>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message