ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jay vyas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CTAKES-320) Methods used by getDefaultPipeline should be able to load reasonable defaults without expecting external files.
Date Sun, 23 Nov 2014 01:23:12 GMT

    [ https://issues.apache.org/jira/browse/CTAKES-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222270#comment-14222270
] 

jay vyas commented on CTAKES-320:
---------------------------------

Hi CTakes. FYI, I now have a persistent framework for storing tweets into Cassandra.  Im going
to modify it also to support data in a SOLR index as well, and then we can choose which 
persistence framework to use for storing twitter data.

The code is here https://github.com/jayunit100/SparkBlueprint/, and I will commit it into
CTakes Sandbox once its feature equivalent to the original twitter streaming POS Tagger. 

At that point, we can run it continuously, and evolve analysis algorithms in a separate batch
on the same system , knowing that our persistent data store will be easily queriable.

> Methods used by getDefaultPipeline  should be able to load reasonable defaults without
expecting external files. 
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: CTAKES-320
>                 URL: https://issues.apache.org/jira/browse/CTAKES-320
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-clinical-pipeline
>    Affects Versions: 3.2.0
>            Reporter: jay vyas
>             Fix For: 3.2.2
>
>
> In CTAKES-314, its evident that some *simple pipelines break* because of files which
are *expected to be unpacked*.
> Also, in one of the builders wants {{org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml}}
to be in place *locally*, at runtime.
> *FIX* 
> - create *some unit tests which ensure that *some reasonable defaults* can be loaded
>  - This can be done using  using {{Class.getResource...}} and embedding some term dictionaries
in jars or 
>  -  maybe by making a *direct URL connection on the fly to  a stable server somewhere*,
so it can download some reasonable defaults for  external data files into {{/tmp/}} or something
with a log warning that it wasnt able to find local resources.).
> *SUMMARY* 
> Generally relying on *unpacked jars* not going to work in a *distributed environment*,
 also , it fails in an *IDE* where we are directly relying on jars pulled in from mvn... so
lets come up with something a little more resilient so we easily can run CTakes at scale :)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message