ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomasz Oliwa <ol...@uchicago.edu>
Subject deserialize and process XCAS files
Date Wed, 14 Sep 2016 18:08:01 GMT
Hi,

I have working code to deserialize XCAS files and read-only process them further, it is based
on CASConsumerTestDriver.java, an example is :

        // inputs to the CAS file and the AE from cTAKES, templates here
        String xCasLocation = <location-of-CAS-file>;
        String taeDescriptionLocation = <location-of-AggregatePlaintextFastUMLSProcessor.xml>;

        // initialize the ae
        InputStream xCasStream = new FileInputStream(xCasLocation);
        AnalysisEngineDescription taeDescription = UIMAFramework.getXMLParser().parseAnalysisEngineDescription(
                new XMLInputSource(new File(taeDescriptionLocation)));
        AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(taeDescription);
        
        // read CAS
        CAS cas = ae.newCAS();
        XCASDeserializer.deserialize(xCasStream, cas);
        
        // print out the Sofa
        System.out.println(cas.getSofaDataString());
        
        // create jCAS and print out some UmlsConcepts
        JFSIndexRepository indexes = cas.getJCas().getJFSIndexRepository();
        Iterator iterator = indexes.getAnnotationIndex(SignSymptomMention.type).iterator();
        while (iterator.hasNext()) {
            SignSymptomMention annot = (SignSymptomMention) iterator.next();
            System.out.println(annot.getCoveredText());
            // further read the annotation
            FSArray ocArr = annot.getOntologyConceptArr();
            // ...
        }

The code above runs fine, but runs sequentially. I have a lot of CAS files and would like
to process them in parallel (for instance to extract some values and store them in another
DB).

My question:

Can I give a reference to the above created AnalysisEngine ae to code that is run in parallel
(java.util.concurrent.Callable or parallel Java 8 Streams, it does not matter), provided that
I only use read operations (such as annot.getCoveredText() or some other calls to get the
CUI) and no two Threads would work on the same CAS ? 

I read in https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Component+Use+Guide
that "cTAKES is not designed to be thread safe", but here I would be doing read-only operations
to extract concepts and CUIs from JCas objects. No new annotations would be created, no annotators
called.

If this is not recommended, what would be the best course of action to deserialize and read-only
process these CAS files?

Thanks for any help, I would really appreciate it
Tomasz
Mime
View raw message