asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Questions of building record in AsterixDB
Date Sun, 01 May 2016 00:39:45 GMT
@Yingyi for null/missing guidance?

On 4/30/16 1:52 PM, abdullah alamoudi wrote:
> Adding a few points here:
>
> My feeling is SerializerDeserializer offers another level of abstraction
> but with output I can write value directly without construct AType object.
> I am wondering if there are any preferences over these two?
>
> - Using The SerializerDeserializer option, you will only create a single
> object regardless of the number of parsed records, so I wouldn't worry
> about it. Code maintainability takes precedence here IMO.
> - In addition to records and lists, UTF8StringSerializerDeserializer can be
> stateful for the same reason (avoid creating lost of un-needed objects). In
> fact, our parsers use the stateful UTF8StringSerializerDeserializer since I
> noticed that using the stateless one creates lots of byte[] and triggers GC
> over and over.
> - Right now, we parse missing values as null. Should that change?
> - There is definitely a check for nulls on non-nullable values at least in
> the ADM parser. There might be a bug however that makes it accept explicit
> null values and that should be fixed.
>
> I am for NOT using the cast record solution for the overhead it will add.
> but that is just me :)
> ~Abdullah.
>
>
> On Sat, Apr 30, 2016 at 6:48 AM, Xikui Wang <xikuiw@uci.edu> wrote:
>
>> Thank you Yingyi. I will try to figure out a solution from that direction.
>>
>> Best,
>> Xikui
>>
>> On Fri, Apr 29, 2016 at 3:48 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
>>
>>> Yeah, I think so:-)
>>>
>>> Best,
>>> Yingyi
>>>
>>> On Fri, Apr 29, 2016 at 3:46 PM, Mike Carey <dtabass@gmail.com> wrote:
>>>
>>>> This indeed might be cleaner?
>>>>
>>>>
>>>> On 4/29/16 3:28 PM, Yingyi Bu wrote:
>>>>
>>>>> I'm guessing that you can do similar things to CastRecordDescriptor
>>>>>>> if you want to handle general cases in that region.
>>>>>>>
>>>>>> Or, you can inject a cast-record function in the loading pipeline
>>>>> so that you can defer the runtime-type-check/cast to that function
>>> instead
>>>>> of doing that in the parser.
>>>>>
>>>>>
>>>>> On Fri, Apr 29, 2016 at 3:25 PM, Yingyi Bu <buyingyi@gmail.com>
>> wrote:
>>>>> My answer is inlined.
>>>>>> My feeling is SerializerDeserializer offers another level of
>>> abstraction
>>>>>>>> but with output I can write value directly without construct
AType
>>>>>>>>
>>>>>>> object.
>>>>>>> I am wondering if there are any preferences over these two?
>>>>>>> I agree with you. However, a SerializerDeserializer has to be
>>> stateless,
>>>>>> hence it cannot be used at runtime for complex type objects such
as
>>>>>> records and lists,
>>>>>> because it will create a lot Java objects.
>>>>>>
>>>>>> in other words, parser has to guarantee that the
>>>>>>>> processed records has to match the dataset definition(non-optional
>>>>>>>> attribute cannot have null value). I tried to assign null
value to
>>>>>>>>
>>>>>>> non-null
>>>>>>> attributes. It will be inserted successfully but read records
will
>>> have
>>>>>>>> problem.
>>>>>>>>
>>>>>>> That sounds right to me.  Please file a JIRA issue and assign
to
>> you (
>>>>>> if you're working on that).
>>>>>> I'm guessing that you can do similar things to CastRecordDescriptor
>>>>>> if you want to handle general cases in that region.
>>>>>>
>>>>>> 3. Set to null or skip
>>>>>>>> For optional(nullable) attributes, if I want to insert a
record
>> with
>>>>>>> null
>>>>>>> value on that attribute. Should I assign null value or should
I just
>>>>>>> skip
>>>>>>> it? (Probably this is related to the missing attribute that Yingyi
>>>>>>>> mentioned today?)
>>>>>>>>
>>>>>>> Assign null value.
>>>>>> Missing means the field doesn't exist in a record at all.
>>>>>>
>>>>>> Best,
>>>>>> Yingyi
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 29, 2016 at 2:06 PM, Xikui Wang <xikuiw@uci.edu>
wrote:
>>>>>>
>>>>>> Hi devs,
>>>>>>> I came across several questions while I was constructing records
in
>>>>>>> AsterixDB.  Hope someone can help me clear the confusion. :)
>>>>>>>
>>>>>>> 1. Write directly to data output or use SerializerDeserializer
>>>>>>> I am working with AbstractDataParser now. I see people using
>> different
>>>>>>> ways
>>>>>>> to append attributes to data output. Either use:
>>>>>>> output.Write(typetag.serialize());
>>>>>>> output.WriteInt(0);
>>>>>>> to write into data output directly, or
>>>>>>> use AInt8SerializerDeserializer.serialize(int8Serde) to serialize
a
>>>>>>> AINT8
>>>>>>> instance to output. *SerializerDeserializer uses writeByte to
write
>>>>>>> output.
>>>>>>>
>>>>>>> My feeling is SerializerDeserializer offers another level of
>>> abstraction
>>>>>>> but with output I can write value directly without construct
AType
>>>>>>> object.
>>>>>>> I am wondering if there are any preferences over these two?
>>>>>>>
>>>>>>> 2. RecordType validation after parser but before add to frame?
>>>>>>> My observation is after parser finish writing the output and
pass it
>>> to
>>>>>>> next level, there is no such validation that checks whether
>>> non-optional
>>>>>>> field is null or not. In other words, parser has to guarantee
that
>> the
>>>>>>> processed records has to match the dataset definition(non-optional
>>>>>>> attribute cannot have null value). I tried to assign null value
to
>>>>>>> non-null
>>>>>>> attributes. It will be inserted successfully but read records
will
>>> have
>>>>>>> problem.
>>>>>>>
>>>>>>> 3. Set to null or skip
>>>>>>> For optional(nullable) attributes, if I want to insert a record
with
>>>>>>> null
>>>>>>> value on that attribute. Should I assign null value or should
I just
>>>>>>> skip
>>>>>>> it? (Probably this is related to the missing attribute that Yingyi
>>>>>>> mentioned today?)
>>>>>>>
>>>>>>> Thanks for your help.
>>>>>>>
>>>>>>> Best,
>>>>>>> Xikui
>>>>>>>
>>>>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message