hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pete Wyckoff <pwyck...@facebook.com>
Subject Re: aerialization.Deserializer.deserialize method help
Date Fri, 12 Sep 2008 22:02:08 GMT

Specifically, line 75 of SequenceFileRecordReader:

>    boolean remaining = (in.next(key) != null);

Throws out the return value of SequenceFile.next which is the result of
deserialize(obj).

-- pete


On 9/12/08 2:28 PM, "Pete Wyckoff" <pwyckoff@facebook.com> wrote:

> 
> What I mean is let's say I plug in a deserializer that always returns a new
> Object - in that case, since everything is pass by value, the new object
> cannot make its way back to the SequenceFileRecordReader user.
> 
> While(sequenceFileRecordReader.next(mykey, myvalue)) {
>   // do something
> }
> 
> And then my deserializers one/both looks like:
> 
> T deserialize(T obj) {
>  // ignore obj
>   return new T(params);
> }
> 
> Obj would be the key or the value passed in by the user, but since I ignore
> it, basically what happens is the deserialized value actually gets thrown
> away. 
> 
> More specifically, it gets thrown away in SequenceFile.Reader I believe.
> 
> -- pete
> 
> 
> On 9/12/08 2:20 PM, "Chris Douglas" <chrisdo@yahoo-inc.com> wrote:
> 
>> If you pass in null to the deserializer, it creates a new instance and
>> returns it; passing in an instance reuses it.
>> 
>> I don't understand the disconnect between Deserializer and the
>> RecordReader. Does your RecordReader generate instances that only
>> share a common subtype T? You need separate Deserializers for K and V,
>> if that's the issue... -C
>> 
>> On Sep 12, 2008, at 2:01 PM, Pete Wyckoff wrote:
>> 
>>> 
>>> This method's signature is
>>> {code}
>>> T deserialize(T);
>>> {code}
>>> 
>>> But, the RecordReader next method is
>>> 
>>> {code}
>>> boolean next(K,V);
>>> {code}
>>> 
>>> So, if the deserialize method does not return the same T (i.e., K or
>>> V), how
>>> would this new Object be propagated back thru the RecordReader next
>>> method.
>>> 
>>> It seems the contract on the deserialize method is that it must
>>> return the
>>> same  T (although the javadocs say "may").
>>> 
>>> Am I missing something? And if not, why isn't the API boolean
>>> deserialize(T)
>>> ?
>>> 
>>> Thanks, pete
>>> 
>>> Ps for things like Thrift, there's no way to re-use the object as
>>> there's no
>>> clear method, so if this is the case, I don't see how it would work??
>>> 
>> 
> 


Mime
View raw message