uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radwen ANIBA <arad...@gmail.com>
Subject Using the Cas to compare documents
Date Thu, 25 Jun 2009 09:35:53 GMT
Hi everyone,

Following some examples applications of UIMA allow us to understand how
every component in UIMA framework works. That great. But one question that a
developper may ask is how to use the CAS to make a comparison of analyzed
documents.

The CAS is common to everydocument and when analzing one of them we have an
acces to the CAS for writing or updating.
Let's imagine We have 3 documents to analyze. We write to the CAS metadata
relative to each of them, but to go futher for the analysis of the documents
it could be very interesting to compare these documents using the CAS,
either in multiple manner or in pairwise.

To illustrate what i'm saying, let's imagine we are looking for email
adresses inside three big documents using UIMA regexp capabilities.
A result may be illustrated like this :

Document 1 :  Number of Unique emails 9 | Number of emails in common with
Document 2 : 10 | Number of emails in common with Document 3 : 6
Document 2 :  Number of Unique emails 5| Number of emails in common with
Document 1 : 20 | Number of emails in common with Document 3 : 1
Document 3 :  Number of Unique emails 4 | Number of emails in common with
Document 1 : 15 | Number of emails in common with Document 2 : 3

Here is a simple cross comparison of documents in pairwise using the CAS, My
question is how to achieve that ?
Do we need to create additional Type System for the common information ? We
have to do it on the fly dynamically ?

Thanks

Rad

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message