uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Campbell <mcampb...@syrres.com>
Subject Multi-Document Processing
Date Mon, 20 Aug 2007 21:27:59 GMT
Hey folks:

    I'm looking at a process that runs each document through a bunch of 
annotators to tag up various information, then I need to do some 
processing/manipulation of those documents based the information held in 
the whole collection.  I've been reading up on the CPE, but it looks 
like it's primarily for running a collection of documents through an 
AE.  I was hoping someone could point me in the right direction for 
doing the collection-wide processing portion of my process.

    I had started out by defining the process as one large aggregate AE 
and running each document through it, but I don't see a way to go 
through that initial tagging process for all documents and then move on 
to the next phase.
    I then switched gears and tried splitting up each phase into it's 
own AE, but then I loose the complex Sofa mappings I had put together 
for the previous attempt.  So I guess this could be solved in two ways - 
one would be that the CPE has some sort of built-in method for doing 
collection-wide processing and manipulation (ie, "first identify all 
location names in all documents, then replace each with a new name, but 
make sure the new name doesn't appear in any other document").  The 
other would be to somehow run through the first phase to identify 
everything, do processing using the collection of JCas's resulting, then 
pump each JCas into a second AE for doing post-processing stuff.  
Somewhere in there would have to be some dynamically-mapped Sofas from 
the phase 1 AE to the phase 2 AE.

    I hope that described my goal well enough, and thanks ahead of time 
for any pointers you guys can throw my way.


View raw message