lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ibrahim (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4293) ArabicRootsAnalyzer
Date Tue, 07 Aug 2012 07:31:02 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ibrahim updated LUCENE-4293:
----------------------------

    Attachment: rootsTableIndex.zip
                ArabicTokens.txt
                ArabicTokenizer.java
                ArabicRootsAnalyzer.java
                ArabicRootFilter.java
    
> ArabicRootsAnalyzer
> -------------------
>
>                 Key: LUCENE-4293
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4293
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Ibrahim
>            Priority: Minor
>         Attachments: ArabicRootFilter.java, ArabicRootsAnalyzer.java, ArabicTokenizer.java,
ArabicTokens.txt, rootsTableIndex.zip
>
>
> ArabicRootsAnalyzer is using an index of Arabic terms associated with its roots. each
Arabic word has a root. There is no automatic way of deciding the root.
> This Analyzer will match any term with its root, searching/indexing will be based on
roots. It gives me great results in my application.
> attached all the required files with the db. the problem with it is the size of the db
(16MB). number of terms is around 300,000. I have another db with 600,000 but the attached
one is summarized and better i believe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message