asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianfeng Jia <jianfeng....@gmail.com>
Subject Re: Possible Race condition in the new UTF8String implementation
Date Wed, 11 Nov 2015 19:54:36 GMT
Here is my finding and thoughts. 
I think I’ve checked all the direct use case of UTF8SerDer. However, I missed some indirect
static/shared use case of UTF8SerDer. 

One big suspect is the RecordDescriptor which has the ISerializerDeserializers inside and
is always passed into the Factory method and shared by the ThreadMethod (usually NodePushable).

E.g., in the ResultWriterOperatorDescriptor, the outRecordDesc is passed to the createPushRuntime()
factory method to create the “resultSerializer”, and it is shared by the thread object
AbstractUnaryInputSinkOperatorNodePushable. This pushable object will directly get the deserializer
from the shared recordDescpitor.getFields()[i]. It explains the issue-1164.

I guess in your case there must be some deserializers given by shared RecordDescriptor. Then
it will get into the racing condition if there are some UTF8StringSerDer involved. 

Given that the SerDers are stored in the shared RecordDescriptor, I think the very initial
design was to make the all the SerDers thread-safe. And it maybe some other data structures
stores the SerDers and are passed/used in a same way. Then I’d have to propose to roll back
the UTF8SerDer into the state-less version (at the expense of creating intermediate buffer
array per record).

Any opinions? 


> On Nov 11, 2015, at 10:54 AM, abdullah alamoudi <bamousaa@gmail.com> wrote:
> 
> That was my first thought and so I changed it. The issue is still there.
> I am also using the UTF8StringSerializerDeserializer to deserialize the
> strings and they always serialize it correctly.
> 
> I am thinking maybe it is related to the UTF8StringPointable but I am not
> sure how that could be.
> I am looking at this as well,
> Abdullah.
> 
> Amoudi, Abdullah.
> 
> On Wed, Nov 11, 2015 at 8:05 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
> wrote:
> 
>> The possible racing condition could be that the
>> UTF8StringSerializerDeserializer now is not a singleton method any more. It
>> was implemented to reuse the byte[] that serialize/deserialize the string
>> object. Let me look into this issue.
>> 
>>> On Nov 11, 2015, at 8:37 AM, abdullah alamoudi <bamousaa@gmail.com>
>> wrote:
>>> 
>>> Highly probable.
>>> Please, let's fix this soon.
>>> 
>>> Amoudi, Abdullah.
>>> 
>>> On Wed, Nov 11, 2015 at 7:32 PM, Till Westmann <tillw@apache.org> wrote:
>>> 
>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1164
>>>> might be related.
>>>> 
>>>> Cheers,
>>>> Till
>>>> 
>>>> On 11 Nov 2015, at 8:25, abdullah alamoudi wrote:
>>>> 
>>>>> Hi all,
>>>>> I am having a hard time figuring this out. Here are the symptoms I am
>>>>> seeing in case one has an idea what this could be.
>>>>> 
>>>>> I have a feed running ingesting data into a dataset. sporadically, I
>> get
>>>>> duplicate key exception errors (The key is of a string type) and I am
>>>> 100%
>>>>> sure that I don't have duplicate records.
>>>>> 
>>>>> Moreover, I am printing the content of the frames about to be inserted
>>>> into
>>>>> the primary index and there are no duplicate records.
>>>>> 
>>>>> There are three reasons why I am suspecting the String implementation:
>>>>> 1. It is fairly recent change.
>>>>> 2. When I run on a single node, or run one thread at a time, I never
>> get
>>>>> this exception.
>>>>> 3. the key is a String.
>>>>> 
>>>>> I have looked at the change trying to figure out where a race condition
>>>>> might take place but it is well hidden (if it is true at all.).
>>>>> 
>>>>> Let me know if you have seen something similar.
>>>>> 
>>>>> Cheers,
>>>>> Abdullah.
>>>> 
>> 
>> 
>> 
>> Best,
>> 
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>> 
>> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Mime
View raw message