manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-1270) Import OpenNLP connector into trunk
Date Wed, 27 Jan 2016 23:01:39 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120333#comment-15120333
] 

Karl Wright edited comment on CONNECTORS-1270 at 1/27/16 11:01 PM:
-------------------------------------------------------------------

[~rafaharo], [~chalitha perera]: I committed code to hook up a revised UI that allows you
to pick a model file from within the file-resources folder, and confirmed that this logic
all basically works.

However this led me to discover another major problem.  The model instances are all static
variables within OpenNlpExtractorConfig.  That setup makes any kind of UI configuration useless
because different configurations for different jobs will not be honored; only one such configuration
can be in effect for the lifetime of any manifoldcf process.  So the current way models are
handled is just plain broken, and has to be fixed.

I understand that this was probably done this way because it is expensive to load model files.
 However, that kind of thing is exactly why connections and connection pooling was created.
 In order to make this connector work properly, we will need to have the models specified
as part of the configuration information (there is none right now at all), and this connector
will have no specification information at all, since there is nothing job-specific anymore
to specify.  This will, of course, require a complete reorganization of the UI and of the
various methods in the connector.

There is another way, but I haven't thought through the implications completely yet, which
is to create a singleton registry of loaded models.  If each model is uniquely identified,
and there is a hash entry keyed by file path, this could work too.

I am willing to tackle this as well, but it will likely take a couple of days to complete
that task.

Another thing I discovered is that the javascript for the UI is weak and does not check for
models that have not been selected.  This too must be fixed.  But that is a smaller job.



was (Author: kwright@metacarta.com):
[~rafaharo], [~chalitha perera]: I committed code to hook up a revised UI that allows you
to pick a model file from within the file-resources folder, and confirmed that this logic
all basically works.

However this led me to discover another major problem.  The model instances are all static
variables within OpenNlpExtractorConfig.  That setup makes any kind of UI configuration useless
because different configurations for different jobs will not be honored; only one such configuration
can be in effect for the lifetime of any manifoldcf process.  So the current way models are
handled is just plain broken, and has to be fixed.

I understand that this was probably done this way because it is expensive to load model files.
 However, that kind of thing is exactly why connections and connection pooling was created.
 In order to make this connector work properly, we will need to have the models specified
as part of the configuration information (there is none right now at all), and this connector
will have no specification information at all, since there is nothing job-specific anymore
to specify.  This will, of course, require a complete reorganization of the UI and of the
various methods in the connector.

I am willing to tackle this as well, but it will likely take a couple of days to complete
that task.

Another thing I discovered is that the javascript for the UI is weak and does not check for
models that have not been selected.  This too must be fixed.  But that is a smaller job.


> Import OpenNLP connector into trunk
> -----------------------------------
>
>                 Key: CONNECTORS-1270
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1270
>             Project: ManifoldCF
>          Issue Type: Task
>            Reporter: Karl Wright
>            Assignee: Rafa Haro
>             Fix For: ManifoldCF 2.4
>
>
> An OpenNLP connector has been contributed on github.  Need to import it into MCF, first
to a branch, then to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message