lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 34570] New: - A new Greek Analyzer for Lucene
Date Fri, 22 Apr 2005 10:05:27 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34570>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=34570

           Summary: A new Greek Analyzer for Lucene
           Product: Lucene
           Version: 1.4
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Analysis
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: past@ebs.gr


I would like to contribute a greek analyzer for lucene. It is based on the
existing Russian analyzer and features:

- most common greek character sets, such as Unicode, ISO-8859-7 and Windows-1253
- a collection of common greek stop words
- conversion of characters with diacritics (accent, diaeresis) in the lower case
filter, as well as handling of special characters, such as small final sigma

For the character sets I used RFC 1947 (Greek Character Encoding for Electronic
Mail Messages) as a reference. I have incorporated this analyzer in Luke as well
as used it successfully in a recent project of my company (EBS Ltd.).

I hope you will find it a useful addition to the project.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message