ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "AndyMC@apache.org (Andy McMurry)" <mcmurry.a...@gmail.com>
Subject Fwd: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and UIMA Descriptors
Date Wed, 27 Aug 2014 06:39:35 GMT
Interesting thread in UIMA core about JSON Serialization CAS and Descriptors. 

Begin forwarded message:

> From: Marshall Schor <msa@schor.com>
> Subject: Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and UIMA Descriptors
> Date: August 25, 2014 at 8:33:54 PM PDT
> To: dev@uima.apache.org
> Reply-To: dev@uima.apache.org
> On 8/25/2014 6:54 PM, Jens Grivolla wrote:
>> Is the JSON serialization documented somewhere?
> Yes, there's a chapter in the reference book.  You can build that
> (uima-docbook-references), until it's released.
> There are also lots of Javadocs in the main implementing class:
> XmiCasSerializer.  (It's in this class because it shares a lot of the machinery
> with Xmi serialization).
>> I saw that there appear to be quite a few alternative serializations. It
>> seems to include something like a typesystem definition, but only with a
>> list of feature names, not their types, if I understood the format
>> correctly (@featureRefs has a list of the features that are not of
>> primitive types, it seems).
> The @featureRefs is only those features which are "references" to other feature
> structures.
> You're correct, in noticing that the feature "range" types are not present. 
> This is because the serialization is to JSON, which supports a native
> representation of things that are collections (JSON arrays) which could be uima
> Arrays or Lists, and ranges that are boolean are representable by JSON true and
> false values.  There is no distinction that a number is a byte/short/int/long,
> because those are all represented as a JSON "number".  And so forth...
> The Json serialization for a CAS can optionally include parts of the type
> system: It can include what the supertypes are for serialized types (to enable
> iterating over a type and all of its subtypes, like Cas iterators normally do); 
> it can also identify which slots which appear to have number values are actually
> to be interpreted as references to other feature structures.  Otherwise, the
> serialized form might have a slot "foo" : 111  which is a number value, and a
> slot "bar" : 112 which is a reference to another feature structure whose ID is
> 112.  This extra information (in @featureRefs) permits the user of the JSON
> serialized form a way to distinguish these two case.
>> It would be very useful if the serialization allowed one to easily pull out
>> a partial CAS with just a subset of the views (by only including some
>> subtrees of the JSON structure), and merge views into it.
> Another optional part of the serialization is a list of views, together with an
> array of numbers each one of which represents a serialized Feature Structure
> that is indexed in that view.
>> This might be
>> complicated, as I understand that the views define annotation indices, but
>> the same annotation can be indexed in several views, right?
> Feature Structures can be classified into "Annotations" and other types (not a
> subtype of Annotation).
> Annotations are special - they have an implied reference to a particular subject
> of analysis.  So they are restricted to being indexed in the view that is
> associated with that subject-of-analysis.
> Other types (not subtypes of Annotation (or more precisely, AnnotationBase)) do
> not have this restriction, and can be indexed in multiple views.
> See
> http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aas.annotations_associated_sofa.
> Let me know where the documentation might be improved :-)
> -Marshall
>> -- Jens

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message