hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.
Date Mon, 03 Apr 2006 21:56:07 GMT
Konstantin Shvachko wrote:
> Here is another example, that I dealt with.
> I wanted to use different value types (long, float or string) for both 
> map and reduce tasks,
> depending on the actual key values. So the solution was to encode the 
> value type into the key value.
> I used keys of the form
> l:<name> - indicating the value type is expected to be long
> f:<name> - value type is expected to be float
> s:<name> - value is a string
> The example is under HADOOP-95.
> Thought somebody might find it practical.

On a related note, ObjectWritable can be used as input or output type, 
and can wrap any Writable class, thus permitting polymorphic inputs and 
outputs.  Nutch uses this to, e.g., combine a URL's incoming anchor 
texts and its content when indexing.  The input type is ObjectWritable, 
and the indexer's InputFormat wraps values from a variety of files.  The 
indexing reducer can then use the 'instanceof' operator to determine how 
to process each input value.  To be more object-oriented, one could have 
all of these classes implement some Indexable interface whose methods 
are invoked when reducing.

Doug

Mime
View raw message