manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1270) Import OpenNLP connector into trunk
Date Tue, 26 Jan 2016 20:45:40 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117964#comment-15117964
] 

Karl Wright commented on CONNECTORS-1270:
-----------------------------------------

[~rafaharo], trying the connector out, it looks like the job specification still relies on
a set of file paths.  This is not going to work in a multi-process environment.  I thought
there had been some discussion about having a canned set of model resources included in the
jar?  It doesn't look like that ever was done...

The included script downloads five model files locally:

{code}
wget -O ${MODELS_DIR}/en-sent.bin http://opennlp.sourceforge.net/models-1.5/en-sent.bin
wget -O ${MODELS_DIR}/en-token.bin http://opennlp.sourceforge.net/models-1.5/en-token.bin
wget -O ${MODELS_DIR}/en-ner-person.bin http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin
wget -O ${MODELS_DIR}/en-ner-location.bin http://opennlp.sourceforge.net/models-1.5/en-ner-location.bin
wget -O ${MODELS_DIR}/en-ner-organization.bin http://opennlp.sourceforge.net/models-1.5/en-ner-organization.bin
{code}

It seems to me that there are a couple of ways forward.  First possibility: If these are accessible
by URL, and are licensed in a manner compatible with Apache redistribution, we could just
incorporate them in the build and (for instance) bundle them as resources in the opennlp connector
jar.  Second possibility: We could download the model on the fly in the connector given the
URL.  For the second possibility to make any sense, though, this would have to be done when
a connection was configured, not as part of the specification information, which would rearrange
the connector somewhat.


> Import OpenNLP connector into trunk
> -----------------------------------
>
>                 Key: CONNECTORS-1270
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1270
>             Project: ManifoldCF
>          Issue Type: Task
>            Reporter: Karl Wright
>            Assignee: Rafa Haro
>             Fix For: ManifoldCF 2.4
>
>
> An OpenNLP connector has been contributed on github.  Need to import it into MCF, first
to a branch, then to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message