lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (Commented) (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-2839) add alternative language detection impl
Date Sun, 16 Oct 2011 14:22:11 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128407#comment-13128407
] 

Jan Høydahl commented on SOLR-2839:
-----------------------------------

Cool. The reasoning behind a list of detected languages was that a more advanced detector
could go sentence by sentence and tag multi lingual documents correctly. FAST had that capability.

How does this impl compare with the Tika one for short texts? And wouldn't it make more sense
to add this on the Tika level letting the detection method be configurable? Then all Tika
users would benefit from it.
                
> add alternative language detection impl
> ---------------------------------------
>
>                 Key: SOLR-2839
>                 URL: https://issues.apache.org/jira/browse/SOLR-2839
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.5, 4.0
>
>         Attachments: SOLR-2839.patch
>
>
> based on http://code.google.com/p/language-detection (apache license), supports 53 languages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message