hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kylie McCormick" <kyliemccorm...@gmail.com>
Subject Re: Writable readFields and write functions
Date Tue, 15 Jul 2008 00:21:36 GMT
Hello Chris:
Thanks for the prompt reply!

So, to conclude from your note:
-- Presently, my RecordReader converts XML strings from a file to MyWritable
object
-- When readFields is called, RecordReader should provide the next
MyWritable object, if there is one
-- When write is called, MyWriter should write the objects out

The RecordReader is record-oriented, but both the readFields and write
functions are byte-oriented... in order for Hadoop to be happy, I need to
coordinate my record-oriented to byte-oriented.

Is this correct? I just want to make sure before I tinker more with the
code, to have the design properly down.

Thanks!
Kylie


On Mon, Jul 14, 2008 at 3:43 PM, Chris Douglas <chrisdo@yahoo-inc.com>
wrote:

> It's easiest to consider write as a function that converts your record to
> bytes and readFields as a function restoring your record from bytes. So it
> should be the case that:
>
> MyWritable i = new MyWritable();
> i.initWithData(some_data);
> i.write(byte_stream);
> ...
> MyWritable j = new MyWritable();
> j.initWithData(some_other_data); // (1)
> j.readFields(byte_stream);
> assert i.equals(j);
>
> Note that the assert should be true whether or not (1) is present, i.e. a
> call to readFields should be deterministic and without hysteresis (it should
> make no difference whether the Writable is newly created or if it formally
> held some other state). readFields must also consume the entire record, so
> for example, if write outputs three integers, readFields must consume three
> integers. Variable-sized Writables are common, but any optional/variably
> sized fields must be encoded to satisfy the preceding.
>
> So if your MyBigWritable record held two ints (integerA, integerB) and a
> MyWritable (my_writable), its write method might look like:
>
> out.writeInt(integerA);
> out.writeInt(integerB);
> my_writable.write(out);
>
> and readFields would restore:
>
> integerA = in.readInt(in);
> integerB = in.readInt(in);
> my_writable.readFields(in);
>
> There are many examples in the source of simple, compound, and
> variably-sized Writables.
>
> Your RecordReader is responsible for providing a key and value to your map.
> Most generic formats rely on Writables or another mode of serialization to
> write and restore objects to/from structured byte sequences, but less
> generic InputFormats will create Writables from byte streams.
> TextInputFormat, for example, will create Text objects from CR-delimited
> files, though Text objects are not, themselves, encoded in the file. In
> constrast, a SequenceFile storing the same data will encode the Text object
> (using its write method) and will restore that object as encoded.
>
> The critical difference is that the framework needs to convert your record
> to a byte stream at various points- hence the Writable interface- while you
> may be more particular about the format from which you consume and the
> format to which you need your output to conform. Note that you can elect to
> use a different serialization framework if you prefer.
>
> If your data structure will be used as a key (implementing
> WritableComparable), it's strongly recommended that you implement a
> RawComparator, which can compare the serialized bytes directly without
> deserializing both arguments. -C
>
>
> On Jul 14, 2008, at 3:39 PM, Kylie McCormick wrote:
>
>  Hi There!
>> I'm currently working on code for my own Writable object (called
>> ServiceWritable) and I've been working off LongWritable for this one. I
>> was
>> wondering, however, about the following two functions:
>>
>> public void readFields(java.io.DataInput in)
>> and
>> public void write(java.io.DataOutput out)
>>
>> I have my own RecordReader object to read in the complex type Service, and
>> I
>> also have my own Writer object to write my complex type ResultSet for
>> output. In LongWritable, the code is very simple:
>>
>> value = in.readLong()
>> and
>> out.writeLong(value);
>>
>> Since I am dealing with more complex objects, the ObjectWritable won't
>> help
>> me. I'm a little confused with the interaction here between my
>> RecordReader,
>> and Writer objects--because there does not seem to be any directly. Can
>> someone help me out here?
>>
>> Thanks,
>> Kylie
>>
>
>


-- 
The Circle of the Dragon -- unlock the mystery that is the dragon.
http://www.blackdrago.com/index.html

"Light, seeking light, doth the light of light beguile!"
-- William Shakespeare's Love's Labor's Lost

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message