hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oded Rosen <o...@legolas-media.com>
Subject How to write a complex Writable
Date Sat, 22 May 2010 21:09:32 GMT
Since there are few sources on how to write a good Writable,
I want to share some tips I've learned over the last few days, while writing
a job that demanded some complex Writable object.
I will be glad to hear more tips, corrections, etc.

Wrtiables are a way to transfer complex data types from the mapper to the
reducer (/ combiner), or as a flexible output format for mapreduce jobs.
However, Hadoop does not always use them the way you think it does:
1. Hadoop reuses writable objects (at least the old API does) - so strange
data miraculously appears in your writables, if they are not cleaned right
before being used.
2. Hadoop compares Writables and hashes them a lot - so writing good
"hashCode" and "equals" functions is a necessity.
3. Hadoop needs an empty constructor for writables - so if you write another
constructor, be sure to also implement the empty one.

Any complex writable object you write (complex = more then just a couple of
fields) should:

*. Override Object's "*equals*": compare all available fields (deep
compare), check unique fields first, to avoid checking the rest.

*. Override Object's "*hashCode*": the simplest way is XORing (^) the hash
codes of the most important fields.

*. Create an *empty constructor* - even if you don't need one. Implementing
a different constructor is ok, as long as the empty is also available.

*. Implement (the mandatory) Writable's *readFields()* and *write()*. Use
versioning to allow scalability over time.
In the very begining of readFields(), clear all available fields (lists,
primitives, etc).
The best way to to do that is to create a clearFields() function, that will
be called both from "readFields()" and from the empty constructor.
Remember Hadoop reuses writables (again, at least the old API - "mapred" -
does), so this is not just a good habit, but clearly a must.

*. implement "*read()*" - this isn't mandatory but it's simple and helpful:

public static UserWritable read(DataInput in) throws IOException {
   UserWritable u = new UserWritable();
   return u;

More golden tips are welcomed. So does remarks.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message