Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 25089 invoked from network); 22 May 2010 21:10:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 May 2010 21:10:03 -0000 Received: (qmail 96933 invoked by uid 500); 22 May 2010 21:10:02 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 96893 invoked by uid 500); 22 May 2010 21:10:02 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 96885 invoked by uid 99); 22 May 2010 21:10:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 May 2010 21:10:02 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 74.125.82.48 is neither permitted nor denied by domain of oded@legolas-media.com) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 May 2010 21:09:54 +0000 Received: by wwb18 with SMTP id 18so1411070wwb.35 for ; Sat, 22 May 2010 14:09:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.157.134 with SMTP id o6mr1912507wek.201.1274562572733; Sat, 22 May 2010 14:09:32 -0700 (PDT) Received: by 10.216.172.11 with HTTP; Sat, 22 May 2010 14:09:32 -0700 (PDT) Date: Sun, 23 May 2010 00:09:32 +0300 Message-ID: Subject: How to write a complex Writable From: Oded Rosen To: general@hadoop.apache.org Content-Type: multipart/alternative; boundary=001485f44ff8644e650487353971 X-Virus-Checked: Checked by ClamAV on apache.org --001485f44ff8644e650487353971 Content-Type: text/plain; charset=ISO-8859-1 Since there are few sources on how to write a good Writable, I want to share some tips I've learned over the last few days, while writing a job that demanded some complex Writable object. I will be glad to hear more tips, corrections, etc. Wrtiables are a way to transfer complex data types from the mapper to the reducer (/ combiner), or as a flexible output format for mapreduce jobs. However, Hadoop does not always use them the way you think it does: 1. Hadoop reuses writable objects (at least the old API does) - so strange data miraculously appears in your writables, if they are not cleaned right before being used. 2. Hadoop compares Writables and hashes them a lot - so writing good "hashCode" and "equals" functions is a necessity. 3. Hadoop needs an empty constructor for writables - so if you write another constructor, be sure to also implement the empty one. Any complex writable object you write (complex = more then just a couple of fields) should: *. Override Object's "*equals*": compare all available fields (deep compare), check unique fields first, to avoid checking the rest. *. Override Object's "*hashCode*": the simplest way is XORing (^) the hash codes of the most important fields. *. Create an *empty constructor* - even if you don't need one. Implementing a different constructor is ok, as long as the empty is also available. *. Implement (the mandatory) Writable's *readFields()* and *write()*. Use versioning to allow scalability over time. In the very begining of readFields(), clear all available fields (lists, primitives, etc). The best way to to do that is to create a clearFields() function, that will be called both from "readFields()" and from the empty constructor. Remember Hadoop reuses writables (again, at least the old API - "mapred" - does), so this is not just a good habit, but clearly a must. *. implement "*read()*" - this isn't mandatory but it's simple and helpful: public static UserWritable read(DataInput in) throws IOException { UserWritable u = new UserWritable(); u.readFields(in); return u; } More golden tips are welcomed. So does remarks. -- Oded --001485f44ff8644e650487353971--