uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kl├╝gl <pklu...@ki.informatik.uni-wuerzburg.de>
Subject Re: CAS Viewer
Date Fri, 09 Jan 2009 17:37:35 GMT
Hi Tong,
> When processing input files that contain HTML tags, most of annotators  will
> "clean-up" the HTML tags before doing any further processing. As the result
> of that, the xmiCAS doesn't contain the original HTML text anymore.
>   
Ah ok. Visual and layout information is quite important for my 
extraction tasks. My rule language has the capability to dynamically 
filter all kinds and combinations of markup and annotations types. 
Therefore the original HTML text stays the main artifact in the xmiCAS 
even if the tags contain no valuable information. I plan to integrate 
"external" annotators with restrictions also in that manner.
> I think the most useful feature of your plug-in is its capability to allow
> users to edit the xmiCAS in the browser window similar to editing the HTML
> page with an HTML Editor (Please corect me if I am wrong).
>   
I am not sure if I understand you. The structure or text of the HTML 
cannot be modified by the CEV plugin (the rule language does such 
things). I think the only real advantage to the CAS Viewer and the CAS 
Editor is that the CEV can display annotations of an HTML artifact in 
some kind of browser and the user can create new annotations in this 
browser. It is really painfully to review or edit annotations in the 
HTML source. There is probably no reason (except maybe the extension 
point) to use the CEV plugin instead of the CAS Viewer if you are just 
processing plain text.
> Having some xmiCAS samples will help us to understand the plug-in's
> capability.
>   
Yes, I will provide a simple example next week.

Have a nice weekend!

Peter


Mime
View raw message