asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xikui Wang <xik...@uci.edu>
Subject Re: Questions of building record in AsterixDB
Date Sun, 01 May 2016 01:10:19 GMT
Hi Mike,

Thanks for pointing that out. I think I misunderstood the working mechanism
and misused the terms of 'dataset' and 'datatype'. Sorry about that.

Best,
Xikui

On Sat, Apr 30, 2016 at 4:30 PM, Mike Carey <dtabass@gmail.com> wrote:

> One nit:  This has nothing to do with any dataset definition, on the parser
> side of things - it's the type parameter on the create feed DDL statement
> that should be the parser's guide.  (In general the optional function on
> the feed may change the type by the time the data reaches a dataset.)
> On Apr 30, 2016 3:26 PM, "Xikui Wang" <xikuiw@uci.edu> wrote:
>
> > Hi Abdullah,
> >
> > Actually I also have the concern that adding null-check for general cases
> > will bring extra
> > overheads. Thus I plan to add the checking procedure after parser, but
> > before addTuple,
> > i.e.FeedRecordDataFlowController. But based on what I have seen so far,
> it
> > seems RecordType
> > is transparent to FeedRecordDataFlowController. So I am still
> investigating
> > that...
> >
> > I saw the null check in ADM parser. That's actually a viable way to
> handle
> > that within the
> > parser scope. But I am looking for a slightly different solution. In my
> > perspective,
> > ADM parser assumes the input adm should conform with the dataset
> > definition.
> > Thus it's reasonable for it to throw a exception. For Tweetparser, if I
> saw
> > null value on non-null attribute, I will
> > discard the whole tweet directly, and may not even log it(as too many
> > tweets with null).
> > That's the reason why I want to put that in FeedRecordDataFlowController,
> > since I didn't see
> > there is a good way to prevent record insert in parser except for throw
> > exception.
> >
> > Not sure my opinion makes sense or not. Feel free to comment. :)
> >
> > Best,
> > Xikui
> >
> > On Sat, Apr 30, 2016 at 1:52 PM, abdullah alamoudi <bamousaa@gmail.com>
> > wrote:
> >
> > > Adding a few points here:
> > >
> > > My feeling is SerializerDeserializer offers another level of
> abstraction
> > > but with output I can write value directly without construct AType
> > object.
> > > I am wondering if there are any preferences over these two?
> > >
> > > - Using The SerializerDeserializer option, you will only create a
> single
> > > object regardless of the number of parsed records, so I wouldn't worry
> > > about it. Code maintainability takes precedence here IMO.
> > > - In addition to records and lists, UTF8StringSerializerDeserializer
> can
> > be
> > > stateful for the same reason (avoid creating lost of un-needed
> objects).
> > In
> > > fact, our parsers use the stateful UTF8StringSerializerDeserializer
> > since I
> > > noticed that using the stateless one creates lots of byte[] and
> triggers
> > GC
> > > over and over.
> > > - Right now, we parse missing values as null. Should that change?
> > > - There is definitely a check for nulls on non-nullable values at least
> > in
> > > the ADM parser. There might be a bug however that makes it accept
> > explicit
> > > null values and that should be fixed.
> > >
> > > I am for NOT using the cast record solution for the overhead it will
> add.
> > > but that is just me :)
> > > ~Abdullah.
> > >
> > >
> > > On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <xikuiw@uci.edu> wrote:
> > >
> > > > Thank you Yingyi. I will try to figure out a solution from that
> > > direction.
> > > >
> > > > Best,
> > > > Xikui
> > > >
> > > > On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <buyingyi@gmail.com>
> wrote:
> > > >
> > > > > Yeah, I think so:-)
> > > > >
> > > > > Best,
> > > > > Yingyi
> > > > >
> > > > > On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <dtabass@gmail.com>
> > wrote:
> > > > >
> > > > > > This indeed might be cleaner?
> > > > > >
> > > > > >
> > > > > > On 4/29/16 3:28 PM, Yingyi Bu wrote:
> > > > > >
> > > > > >> I'm guessing that you can do similar things to
> > CastRecordDescriptor
> > > > > >>>> if you want to handle general cases in that region.
> > > > > >>>>
> > > > > >>> Or, you can inject a cast-record function in the loading
> pipeline
> > > > > >> so that you can defer the runtime-type-check/cast to that
> function
> > > > > instead
> > > > > >> of doing that in the parser.
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu <buyingyi@gmail.com>
> > > > wrote:
> > > > > >>
> > > > > >> My answer is inlined.
> > > > > >>>
> > > > > >>> My feeling is SerializerDeserializer offers another
level of
> > > > > abstraction
> > > > > >>>>> but with output I can write value directly without
construct
> > > AType
> > > > > >>>>>
> > > > > >>>> object.
> > > > > >>>
> > > > > >>>> I am wondering if there are any preferences over
these two?
> > > > > >>>>>
> > > > > >>>> I agree with you. However, a SerializerDeserializer
has to be
> > > > > stateless,
> > > > > >>> hence it cannot be used at runtime for complex type
objects
> such
> > as
> > > > > >>> records and lists,
> > > > > >>> because it will create a lot Java objects.
> > > > > >>>
> > > > > >>> in other words, parser has to guarantee that the
> > > > > >>>>> processed records has to match the dataset
> > > definition(non-optional
> > > > > >>>>> attribute cannot have null value). I tried to
assign null
> value
> > > to
> > > > > >>>>>
> > > > > >>>> non-null
> > > > > >>>
> > > > > >>>> attributes. It will be inserted successfully but
read records
> > will
> > > > > have
> > > > > >>>>> problem.
> > > > > >>>>>
> > > > > >>>> That sounds right to me.  Please file a JIRA issue
and assign
> to
> > > > you (
> > > > > >>> if you're working on that).
> > > > > >>> I'm guessing that you can do similar things to
> > CastRecordDescriptor
> > > > > >>> if you want to handle general cases in that region.
> > > > > >>>
> > > > > >>> 3. Set to null or skip
> > > > > >>>>> For optional(nullable) attributes, if I want
to insert a
> record
> > > > with
> > > > > >>>>>
> > > > > >>>> null
> > > > > >>>
> > > > > >>>> value on that attribute. Should I assign null value
or should
> I
> > > just
> > > > > >>>>>
> > > > > >>>> skip
> > > > > >>>
> > > > > >>>> it? (Probably this is related to the missing attribute
that
> > Yingyi
> > > > > >>>>> mentioned today?)
> > > > > >>>>>
> > > > > >>>> Assign null value.
> > > > > >>> Missing means the field doesn't exist in a record at
all.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Yingyi
> > > > > >>>
> > > > > >>>
> > > > > >>> On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <xikuiw@uci.edu>
> > > wrote:
> > > > > >>>
> > > > > >>> Hi devs,
> > > > > >>>>
> > > > > >>>> I came across several questions while I was constructing
> records
> > > in
> > > > > >>>> AsterixDB.  Hope someone can help me clear the confusion.
:)
> > > > > >>>>
> > > > > >>>> 1. Write directly to data output or use SerializerDeserializer
> > > > > >>>> I am working with AbstractDataParser now. I see
people using
> > > > different
> > > > > >>>> ways
> > > > > >>>> to append attributes to data output. Either use:
> > > > > >>>> output.Write(typetag.serialize());
> > > > > >>>> output.WriteInt(0);
> > > > > >>>> to write into data output directly, or
> > > > > >>>> use AInt8SerializerDeserializer.serialize(int8Serde)
to
> > serialize
> > > a
> > > > > >>>> AINT8
> > > > > >>>> instance to output. *SerializerDeserializer uses
writeByte to
> > > write
> > > > > >>>> output.
> > > > > >>>>
> > > > > >>>> My feeling is SerializerDeserializer offers another
level of
> > > > > abstraction
> > > > > >>>> but with output I can write value directly without
construct
> > AType
> > > > > >>>> object.
> > > > > >>>> I am wondering if there are any preferences over
these two?
> > > > > >>>>
> > > > > >>>> 2. RecordType validation after parser but before
add to frame?
> > > > > >>>> My observation is after parser finish writing the
output and
> > pass
> > > it
> > > > > to
> > > > > >>>> next level, there is no such validation that checks
whether
> > > > > non-optional
> > > > > >>>> field is null or not. In other words, parser has
to guarantee
> > that
> > > > the
> > > > > >>>> processed records has to match the dataset
> > definition(non-optional
> > > > > >>>> attribute cannot have null value). I tried to assign
null
> value
> > to
> > > > > >>>> non-null
> > > > > >>>> attributes. It will be inserted successfully but
read records
> > will
> > > > > have
> > > > > >>>> problem.
> > > > > >>>>
> > > > > >>>> 3. Set to null or skip
> > > > > >>>> For optional(nullable) attributes, if I want to
insert a
> record
> > > with
> > > > > >>>> null
> > > > > >>>> value on that attribute. Should I assign null value
or should
> I
> > > just
> > > > > >>>> skip
> > > > > >>>> it? (Probably this is related to the missing attribute
that
> > Yingyi
> > > > > >>>> mentioned today?)
> > > > > >>>>
> > > > > >>>> Thanks for your help.
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>> Xikui
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message