nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Kubes (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool
Date Thu, 17 Dec 2009 16:19:22 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791996#action_12791996
] 

Dennis Kubes commented on NUTCH-666:
------------------------------------

I don't remember exactly what the difference was, but I do remember that there was a subtle
difference in the algorithms that was only noticed after creating the new tools.  I think
it had something to do with how the ngrams were being handled or that it was taking spaces
into account.  But try running the identifiers side by side, you will see there is a considerable
difference.

> Analysis plugins for multiple language and new Language Identifier Tool
> -----------------------------------------------------------------------
>
>                 Key: NUTCH-666
>                 URL: https://issues.apache.org/jira/browse/NUTCH-666
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>         Attachments: NUTCH-666-1-20081126.patch, NUTCH-666-2-20091217-nf.patch
>
>
> Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, russian, and
thai.  Also includes a new Language Identifier tool that used the new indexing framework in
NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message