ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Green <john.travis.gr...@gmail.com>
Subject Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and UIMA Descriptors
Date Wed, 27 Aug 2014 16:30:12 GMT
Very good


On Wed, Aug 27, 2014 at 2:39 AM, AndyMC@apache.org (Andy McMurry) <
mcmurry.andy@gmail.com> wrote:

> Interesting thread in UIMA core about JSON Serialization CAS and
> Descriptors.
>
>
> Begin forwarded message:
>
> > From: Marshall Schor <msa@schor.com>
> > Subject: Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for
> CASs and UIMA Descriptors
> > Date: August 25, 2014 at 8:33:54 PM PDT
> > To: dev@uima.apache.org
> > Reply-To: dev@uima.apache.org
> >
> >
> > On 8/25/2014 6:54 PM, Jens Grivolla wrote:
> >> Is the JSON serialization documented somewhere?
> > Yes, there's a chapter in the reference book.  You can build that
> > (uima-docbook-references), until it's released.
> >
> > There are also lots of Javadocs in the main implementing class:
> > XmiCasSerializer.  (It's in this class because it shares a lot of the
> machinery
> > with Xmi serialization).
> >
> >>
> >> I saw that there appear to be quite a few alternative serializations. It
> >> seems to include something like a typesystem definition, but only with a
> >> list of feature names, not their types, if I understood the format
> >> correctly (@featureRefs has a list of the features that are not of
> >> primitive types, it seems).
> > The @featureRefs is only those features which are "references" to other
> feature
> > structures.
> >
> > You're correct, in noticing that the feature "range" types are not
> present.
> > This is because the serialization is to JSON, which supports a native
> > representation of things that are collections (JSON arrays) which could
> be uima
> > Arrays or Lists, and ranges that are boolean are representable by JSON
> true and
> > false values.  There is no distinction that a number is a
> byte/short/int/long,
> > because those are all represented as a JSON "number".  And so forth...
> >
> > The Json serialization for a CAS can optionally include parts of the type
> > system: It can include what the supertypes are for serialized types (to
> enable
> > iterating over a type and all of its subtypes, like Cas iterators
> normally do);
> > it can also identify which slots which appear to have number values are
> actually
> > to be interpreted as references to other feature structures.  Otherwise,
> the
> > serialized form might have a slot "foo" : 111  which is a number value,
> and a
> > slot "bar" : 112 which is a reference to another feature structure whose
> ID is
> > 112.  This extra information (in @featureRefs) permits the user of the
> JSON
> > serialized form a way to distinguish these two case.
> >
> >>
> >> It would be very useful if the serialization allowed one to easily pull
> out
> >> a partial CAS with just a subset of the views (by only including some
> >> subtrees of the JSON structure), and merge views into it.
> > Another optional part of the serialization is a list of views, together
> with an
> > array of numbers each one of which represents a serialized Feature
> Structure
> > that is indexed in that view.
> >> This might be
> >> complicated, as I understand that the views define annotation indices,
> but
> >> the same annotation can be indexed in several views, right?
> >
> > Feature Structures can be classified into "Annotations" and other types
> (not a
> > subtype of Annotation).
> >
> > Annotations are special - they have an implied reference to a particular
> subject
> > of analysis.  So they are restricted to being indexed in the view that is
> > associated with that subject-of-analysis.
> >
> > Other types (not subtypes of Annotation (or more precisely,
> AnnotationBase)) do
> > not have this restriction, and can be indexed in multiple views.
> >
> > See
> >
> http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aas.annotations_associated_sofa
> .
> >
> > Let me know where the documentation might be improved :-)
> >
> > -Marshall
> >>
> >> -- Jens
> >>
> >>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message