Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@asterixdb.incubator.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAOQkTyOTz2C98_bZjhVCpFtQ2mS0LF=WkHEsb2wdESmQ9YAojA@mail.gmail.com>
References: 
 <CAN-bgPqZKatzFFyrp_406X0TwRcBLk67f-H6T4Jyx8dLujkpSg@mail.gmail.com>
	<CALjEsF4jjegtAn9-4gdTMXrKB6WGRphZUcNEK6r27dM2A3Begg@mail.gmail.com>
	<CALjEsF7Dtuf3vT0cuHzfxYwihLo+vi8z7EWnUgjvwZDim-HK6g@mail.gmail.com>
	<5723E463.4060704@gmail.com>
	<CALjEsF6uwdJyNX+_fJ4VuEwPtPfnxWMB1CQU6i+Y=PYhfyuGhA@mail.gmail.com>
	<CAN-bgPoX9WQRBDCGowTPJiyOzZOUUJPYTWrVUS0eEEY-fbzGmw@mail.gmail.com>
	<CAOQkTyOTz2C98_bZjhVCpFtQ2mS0LF=WkHEsb2wdESmQ9YAojA@mail.gmail.com>
Date: Sat, 30 Apr 2016 15:25:58 -0700
Message-ID: 
 <CAN-bgPqj=NPPgeRg=dO7eencfk2fMo-bJtE2M6EroPy66swNow@mail.gmail.com>
Subject: Re: Questions of building record in AsterixDB
From: Xikui Wang <xikuiw@uci.edu>
To: dev@asterixdb.incubator.apache.org
Content-Type: multipart/alternative; boundary=001a11c00b40578aaf0531bb3ebd

--001a11c00b40578aaf0531bb3ebd
Content-Type: text/plain; charset=UTF-8

Hi Abdullah,

Actually I also have the concern that adding null-check for general cases
will bring extra
overheads. Thus I plan to add the checking procedure after parser, but
before addTuple,
i.e.FeedRecordDataFlowController. But based on what I have seen so far, it
seems RecordType
is transparent to FeedRecordDataFlowController. So I am still investigating
that...

I saw the null check in ADM parser. That's actually a viable way to handle
that within the
parser scope. But I am looking for a slightly different solution. In my
perspective,
ADM parser assumes the input adm should conform with the dataset
definition.
Thus it's reasonable for it to throw a exception. For Tweetparser, if I saw
null value on non-null attribute, I will
discard the whole tweet directly, and may not even log it(as too many
tweets with null).
That's the reason why I want to put that in FeedRecordDataFlowController,
since I didn't see
there is a good way to prevent record insert in parser except for throw
exception.

Not sure my opinion makes sense or not. Feel free to comment. :)

Best,
Xikui

On Sat, Apr 30, 2016 at 1:52 PM, abdullah alamoudi <bamousaa@gmail.com>
wrote:

> Adding a few points here:
>
> My feeling is SerializerDeserializer offers another level of abstraction
> but with output I can write value directly without construct AType object.
> I am wondering if there are any preferences over these two?
>
> - Using The SerializerDeserializer option, you will only create a single
> object regardless of the number of parsed records, so I wouldn't worry
> about it. Code maintainability takes precedence here IMO.
> - In addition to records and lists, UTF8StringSerializerDeserializer can be
> stateful for the same reason (avoid creating lost of un-needed objects). In
> fact, our parsers use the stateful UTF8StringSerializerDeserializer since I
> noticed that using the stateless one creates lots of byte[] and triggers GC
> over and over.
> - Right now, we parse missing values as null. Should that change?
> - There is definitely a check for nulls on non-nullable values at least in
> the ADM parser. There might be a bug however that makes it accept explicit
> null values and that should be fixed.
>
> I am for NOT using the cast record solution for the overhead it will add.
> but that is just me :)
> ~Abdullah.
>
>
> On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <xikuiw@uci.edu> wrote:
>
> > Thank you Yingyi. I will try to figure out a solution from that
> direction.
> >
> > Best,
> > Xikui
> >
> > On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
> >
> > > Yeah, I think so:-)
> > >
> > > Best,
> > > Yingyi
> > >
> > > On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <dtabass@gmail.com> wrote:
> > >
> > > > This indeed might be cleaner?
> > > >
> > > >
> > > > On 4/29/16 3:28 PM, Yingyi Bu wrote:
> > > >
> > > >> I'm guessing that you can do similar things to CastRecordDescriptor
> > > >>>> if you want to handle general cases in that region.
> > > >>>>
> > > >>> Or, you can inject a cast-record function in the loading pipeline
> > > >> so that you can defer the runtime-type-check/cast to that function
> > > instead
> > > >> of doing that in the parser.
> > > >>
> > > >>
> > > >> On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu <buyingyi@gmail.com>
> > wrote:
> > > >>
> > > >> My answer is inlined.
> > > >>>
> > > >>> My feeling is SerializerDeserializer offers another level of
> > > abstraction
> > > >>>>> but with output I can write value directly without construct
> AType
> > > >>>>>
> > > >>>> object.
> > > >>>
> > > >>>> I am wondering if there are any preferences over these two?
> > > >>>>>
> > > >>>> I agree with you. However, a SerializerDeserializer has to be
> > > stateless,
> > > >>> hence it cannot be used at runtime for complex type objects such as
> > > >>> records and lists,
> > > >>> because it will create a lot Java objects.
> > > >>>
> > > >>> in other words, parser has to guarantee that the
> > > >>>>> processed records has to match the dataset
> definition(non-optional
> > > >>>>> attribute cannot have null value). I tried to assign null value
> to
> > > >>>>>
> > > >>>> non-null
> > > >>>
> > > >>>> attributes. It will be inserted successfully but read records will
> > > have
> > > >>>>> problem.
> > > >>>>>
> > > >>>> That sounds right to me.  Please file a JIRA issue and assign to
> > you (
> > > >>> if you're working on that).
> > > >>> I'm guessing that you can do similar things to CastRecordDescriptor
> > > >>> if you want to handle general cases in that region.
> > > >>>
> > > >>> 3. Set to null or skip
> > > >>>>> For optional(nullable) attributes, if I want to insert a record
> > with
> > > >>>>>
> > > >>>> null
> > > >>>
> > > >>>> value on that attribute. Should I assign null value or should I
> just
> > > >>>>>
> > > >>>> skip
> > > >>>
> > > >>>> it? (Probably this is related to the missing attribute that Yingyi
> > > >>>>> mentioned today?)
> > > >>>>>
> > > >>>> Assign null value.
> > > >>> Missing means the field doesn't exist in a record at all.
> > > >>>
> > > >>> Best,
> > > >>> Yingyi
> > > >>>
> > > >>>
> > > >>> On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <xikuiw@uci.edu>
> wrote:
> > > >>>
> > > >>> Hi devs,
> > > >>>>
> > > >>>> I came across several questions while I was constructing records
> in
> > > >>>> AsterixDB.  Hope someone can help me clear the confusion. :)
> > > >>>>
> > > >>>> 1. Write directly to data output or use SerializerDeserializer
> > > >>>> I am working with AbstractDataParser now. I see people using
> > different
> > > >>>> ways
> > > >>>> to append attributes to data output. Either use:
> > > >>>> output.Write(typetag.serialize());
> > > >>>> output.WriteInt(0);
> > > >>>> to write into data output directly, or
> > > >>>> use AInt8SerializerDeserializer.serialize(int8Serde) to serialize
> a
> > > >>>> AINT8
> > > >>>> instance to output. *SerializerDeserializer uses writeByte to
> write
> > > >>>> output.
> > > >>>>
> > > >>>> My feeling is SerializerDeserializer offers another level of
> > > abstraction
> > > >>>> but with output I can write value directly without construct AType
> > > >>>> object.
> > > >>>> I am wondering if there are any preferences over these two?
> > > >>>>
> > > >>>> 2. RecordType validation after parser but before add to frame?
> > > >>>> My observation is after parser finish writing the output and pass
> it
> > > to
> > > >>>> next level, there is no such validation that checks whether
> > > non-optional
> > > >>>> field is null or not. In other words, parser has to guarantee that
> > the
> > > >>>> processed records has to match the dataset definition(non-optional
> > > >>>> attribute cannot have null value). I tried to assign null value to
> > > >>>> non-null
> > > >>>> attributes. It will be inserted successfully but read records will
> > > have
> > > >>>> problem.
> > > >>>>
> > > >>>> 3. Set to null or skip
> > > >>>> For optional(nullable) attributes, if I want to insert a record
> with
> > > >>>> null
> > > >>>> value on that attribute. Should I assign null value or should I
> just
> > > >>>> skip
> > > >>>> it? (Probably this is related to the missing attribute that Yingyi
> > > >>>> mentioned today?)
> > > >>>>
> > > >>>> Thanks for your help.
> > > >>>>
> > > >>>> Best,
> > > >>>> Xikui
> > > >>>>
> > > >>>>
> > > >>>
> > > >
> > >
> >
>

--001a11c00b40578aaf0531bb3ebd--