manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1270) Import OpenNLP connector into trunk
Date Tue, 26 Jan 2016 20:45:40 GMT


Karl Wright commented on CONNECTORS-1270:

[~rafaharo], trying the connector out, it looks like the job specification still relies on
a set of file paths.  This is not going to work in a multi-process environment.  I thought
there had been some discussion about having a canned set of model resources included in the
jar?  It doesn't look like that ever was done...

The included script downloads five model files locally:

wget -O ${MODELS_DIR}/en-sent.bin
wget -O ${MODELS_DIR}/en-token.bin
wget -O ${MODELS_DIR}/en-ner-person.bin
wget -O ${MODELS_DIR}/en-ner-location.bin
wget -O ${MODELS_DIR}/en-ner-organization.bin

It seems to me that there are a couple of ways forward.  First possibility: If these are accessible
by URL, and are licensed in a manner compatible with Apache redistribution, we could just
incorporate them in the build and (for instance) bundle them as resources in the opennlp connector
jar.  Second possibility: We could download the model on the fly in the connector given the
URL.  For the second possibility to make any sense, though, this would have to be done when
a connection was configured, not as part of the specification information, which would rearrange
the connector somewhat.

> Import OpenNLP connector into trunk
> -----------------------------------
>                 Key: CONNECTORS-1270
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Task
>            Reporter: Karl Wright
>            Assignee: Rafa Haro
>             Fix For: ManifoldCF 2.4
> An OpenNLP connector has been contributed on github.  Need to import it into MCF, first
to a branch, then to trunk.

This message was sent by Atlassian JIRA

View raw message