hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <chri...@yahoo-inc.com>
Subject Re: How do I implement a Writable into another Writable?
Date Mon, 20 Oct 2008 18:38:25 GMT
TupleWritable is not a general-purpose type. It's used for map-side  
joins, where the arity of a tuple is fixed by construction. Its intent  
is a transient type with very, very specific applications in mind.

It sounds like you don't need a general list type, as you don't need  
to worry about encoding the type of object your list contains.  
Writables are *not* supposed to read to the end of the stream they're  
given; they are to consume a full instance from the stream (i.e. it  
must consume all "its" bytes from a stream, even if it ultimately  
discards them). Given these constraints, Writable types of variable  
size almost always encode their length explicitly. As Joman mentioned,  
your constructor must initialize all its elements. Further, readFields  
must not retain any state from the value it formerly contained, so you  
need to clear the list before you add more values to it. This means  
your getNameList method will need to do a shallow copy of its elements  
if the caller stores a reference to the list.

This should work:

   public void readFields(DataInput in) throws IOException {
     nameList.clear();
     score = in.readDouble();
     final int len = WritableUtils.readVInt(in);
     for (int i = 0; i < len; ++i) {
       nameList.add(Text.readString(in));
     }
   }

   public void write(DataOutput out) throws IOException {
     out.writeDouble(score);
     WritableUtils.writeVInt(out, nameList.size());
     for (String name : nameList) {
       Text.writeString(out, name);
     }
   }

You can improve your performance by (re)using a collection of Text  
instead of String (since the latter is immutable), but that requires  
more work. -C

On Oct 19, 2008, at 3:39 PM, Yih Sun Khoo wrote:

> I think when it comes to the TupleWritable being part of a custm  
> writable,
> you cannot just say tupleWritable.readFields(in) and
> tupleWritable.write(out)
>
> I might be wrong.  Has anyone successfully implemented a  
> TupleWritable with
> ,say, a DoubleWritable in a custom writable?
>
> On Sun, Oct 19, 2008 at 3:33 AM, Joman Chu <jomanchu@gmail.com> wrote:
>
>> hrm, try implementing the read(DataInput in) method, as well as a
>> blank constructor MyWritable() that fills dummy values into your
>> instance variables. For example this should be all you need for
>> read(DataInput in),
>>
>> public static MyWritable read(DataInput in) throws IOException {
>>       MyWritable w = new MyWritable();
>>       w.readFields(in);
>>       return w;
>> }
>>
>> EDIT: I was able to sort of replicate your error. In my  
>> constructor, i
>> had my instance variables assigned to null. Make sure you assign them
>> to new instances of whatever Writable you are using.
>>
>>
>> Joman Chu
>> http://www.notatypewriter.com/
>> AIM: ARcanUSNUMquam
>>
>>
>>
>> On Sun, Oct 19, 2008 at 5:10 AM, Yih Sun Khoo <yskhoo@gmail.com>  
>> wrote:
>>> Joman to add a little bit more to one of my previous mails about the
>>> readFields methods
>>>
>>> Have you ever had something like this?
>>>
>>> public class MyWritable implements Writable {
>>>   private DoubleWritable doubleWritable;
>>>   private TupleWritable tupleWritable;
>>>
>>>   public void readFields(DataInput in) throws IOException {
>>>       doubleWritable.readFields(in);
>>>       tupleWritable.readFields(in);
>>>   }
>>>
>>>   public void write(DataOutput out) throws IOException {
>>>       doubleWritable.write(out);
>>>       tupleWritable.write(out);
>>>   }
>>>
>>>
>>> }
>>>
>>> On Sun, Oct 19, 2008 at 1:59 AM, Joman Chu <jomanchu@gmail.com>  
>>> wrote:
>>>
>>>> I've never used TupleWritable, so hopefully somebody else can  
>>>> help you
>>>> with that.
>>>> Joman Chu
>>>> http://www.notatypewriter.com/
>>>> AIM: ARcanUSNUMquam
>>>>
>>>>
>>>>
>>>> On Sun, Oct 19, 2008 at 4:40 AM, Yih Sun Khoo <yskhoo@gmail.com>  
>>>> wrote:
>>>>> Also, I've noticed TupleWritable to be quite useful.
>>>>> What are good techniques for using TupleWritable in a mapping  
>>>>> phase
>> for a
>>>>> "list of Text" when you do not know the size of that "list"  
>>>>> ahead of
>> time
>>>>>
>>>>> Say I had a custom writable which implemented TupleWritable and  
>>>>> the
>>>> custom
>>>>> writable contained a setter method
>>>>> mycustomwritable.setTupleWritable( ...  )
>>>>>
>>>>> Where the ellipsis is, there lies the TupleWritable.  However I'm
>>>> wondering
>>>>> since TupleWritable can be constructed using
>> TupleWritable(Writable[]),
>>>> how
>>>>> do I dynamically resize the Writable[] and add Text elements to it
>> when I
>>>>> don't know the size of the Writable[] very well.  Does this make
>> sense?
>>>>>
>>>>>
>>>>> On Sun, Oct 19, 2008 at 1:32 AM, Yih Sun Khoo <yskhoo@gmail.com>
>> wrote:
>>>>>
>>>>>> Let's say in the reduce phase your value happens to hold an
>>>>>> ArrayListWritable
>>>>>> In this example, value is of type ArrayListWritable
>>>>>> Maybe I've not thought about this or done this before, but how  
>>>>>> does
>> one
>>>>>> "read data in from the DataInput stream" in the reduce phase so 

>>>>>> that
>> the
>>>>>> ArrayListWritable which is a value already passed to the  
>>>>>> reducer can
>> be
>>>> used
>>>>>> as ArrayListWritable
>>>>>>
>>>>>>
>>>>>> On Sun, Oct 19, 2008 at 1:25 AM, Joman Chu <jomanchu@gmail.com>
>> wrote:
>>>>>>
>>>>>>> Since the ArrayListWritable extends ArrayList, you have access
 
>>>>>>> to
>> all
>>>>>>> the ArrayList methods as well. Once you read data in from the
>>>>>>> DataInput stream, you should be able to use ArrayListWritable
 
>>>>>>> just
>>>>>>> like a regular ArrayList.
>>>>>>> Joman Chu
>>>>>>> http://www.notatypewriter.com/
>>>>>>> AIM: ARcanUSNUMquam
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Oct 19, 2008 at 4:16 AM, Yih Sun Khoo <yskhoo@gmail.com>
>>>> wrote:
>>>>>>>> Hmm, what method from ArrayListWritable allows you to access
 
>>>>>>>> the
>>>>>>> different
>>>>>>>> elements of the ArrayList?  Would it be readFields?  for
 
>>>>>>>> example,
>> in
>>>> a
>>>>>>>> reduce phase, if I needed to know the size of the array list,
 
>>>>>>>> it
>>>> would
>>>>>>> be
>>>>>>>> easy if i were dealing with an arraylist because i could
just  
>>>>>>>> say
>>>>>>>> arraylist.size.  How would i accomplish that with the writable
>>>>>>> counterpart?
>>>>>>>>
>>>>>>>> On Sun, Oct 19, 2008 at 1:04 AM, Joman Chu <jomanchu@gmail.com>
>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> For the ArrayList object, try taking a look at the  
>>>>>>>>> implementation
>> of
>>>>>>>>> ArrayListWritable by Jimmy Lin at UMD here:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>
>> https://subversion.umiacs.umd.edu/umd-hadoop/core/trunk/src/edu/umd/cloud9/io/ArrayListWritable.java
>>>>>>>>>
>>>>>>>>> But basically in the readFields methods, I prefer using
each
>>>> Writable
>>>>>>>>> object's readFields method to read the data in. For example,
 
>>>>>>>>> for
>>>> your
>>>>>>>>> double variable, I would use a DoubleWritable object
and in  
>>>>>>>>> the
>>>>>>>>> MyWritable.readFields(DataInput in), I would use
>>>>>>>>> nameofdoublewritable.readFields(in). For the
>>>>>>>>> MyWritable.write(DataOutput out) method, I would use
>>>>>>>>> nameofdoublewritable.write(out).
>>>>>>>>>
>>>>>>>>> Have a good one,
>>>>>>>>>
>>>>>>>>> Joman Chu
>>>>>>>>> http://www.notatypewriter.com/
>>>>>>>>> AIM: ARcanUSNUMquam
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Oct 19, 2008 at 3:30 AM, Yih Sun Khoo <yskhoo@gmail.com

>>>>>>>>> >
>>>>>>> wrote:
>>>>>>>>>> I don't quite know how to write the read and write
functions,
>> but
>>>> I
>>>>>>> want
>>>>>>>>> to
>>>>>>>>>> write my own writable, which should have a
>> DoubleWritable/double
>>>>>>> value
>>>>>>>>>> followed by a list of Strings/Text.  This Writable
will be  
>>>>>>>>>> used
>> as
>>>> a
>>>>>>>>> value.
>>>>>>>>>> Is the code below the best way to go about writing
such a
>>>> writable?
>>>>>>>>>>
>>>>>>>>>> import java.io.DataInput;
>>>>>>>>>> import java.io.DataOutput;
>>>>>>>>>> import java.io.EOFException;
>>>>>>>>>> import java.io.IOException;
>>>>>>>>>> import java.util.ArrayList;
>>>>>>>>>>
>>>>>>>>>> import org.apache.hadoop.io.Writable;
>>>>>>>>>>
>>>>>>>>>> public class MyWritable implements Writable {
>>>>>>>>>>   private double score;
>>>>>>>>>>   private ArrayList<String> nameList;
>>>>>>>>>>
>>>>>>>>>>   public void setScore(double score) {
>>>>>>>>>>       this.score= score;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>>   public void setNameList(ArrayList<String>
nameList) {
>>>>>>>>>>       this.nameList= nameList;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>>   public double getScore() {
>>>>>>>>>>       return score;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>>   public ArrayList<String> getNameList() {
>>>>>>>>>>       return nameList;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>>   public void readFields(DataInput in) throws IOException
{
>>>>>>>>>>       score= in.readDouble();
>>>>>>>>>>       try {
>>>>>>>>>>           do {
>>>>>>>>>>               nameList.add(in.readUTF());
>>>>>>>>>>           } while (true);
>>>>>>>>>>       } catch (EOFException eofe) {
>>>>>>>>>>           // continue; done
>>>>>>>>>>       }
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>>   public void write(DataOutput out) throws IOException
{
>>>>>>>>>>       out.writeDouble(score);
>>>>>>>>>>       for (String name: nameList) {
>>>>>>>>>>           out.writeUTF(name);
>>>>>>>>>>       }
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>


Mime
View raw message