lucene-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pavithra kariyawasam (Jira)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-9044) Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist of language dependent tokenizer, stemming algorithm and list of stop words.
Date Wed, 13 Nov 2019 13:11:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

pavithra kariyawasam updated LUCENE-9044:
-----------------------------------------
    Review Patch?:   (was: Yes)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer which consist
of language dependent tokenizer, stemming algorithm and list of stop words.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9044
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9044
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>         Environment: Lucene
>            Reporter: pavithra kariyawasam
>            Priority: Major
>              Labels: features
>             Fix For: 5.5.6
>
>         Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, SinhalaTokenizer.java,
stopwords.txt
>
>
> This component is developed based on three main researches.
> Lucene did not have component to analyze Sinhala documents. So our intension is to fill
that space with an Analyzer which can analyze Sinhala documents. Sinhala Analyzer has implemented
by performing Sinhala morphological analysis. Tokenizing the document content precisely, Removing
stopwords accordingly and converting the terms to its base/root form accurately are the main
three functionalities of Sinhala Analyzer. These are built by considering the grammatical
rules in Sinhala 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Mime
View raw message