hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.
Date Mon, 03 Apr 2006 20:36:29 GMT
I agree that the framework must be as general as possible. Which means 
one should use some simple
data structure for keys and value, like string or BytesWritable.
Also nothing prevents us from implementing other types on top of the 
framework as an optional
layer of higher level API.

Here is another example, that I dealt with.
I wanted to use different value types (long, float or string) for both 
map and reduce tasks,
depending on the actual key values. So the solution was to encode the 
value type into the key value.
I used keys of the form
l:<name> - indicating the value type is expected to be long
f:<name> - value type is expected to be float
s:<name> - value is a string
The example is under HADOOP-95.
Thought somebody might find it practical.


Doug Cutting wrote:

> Eric Baldeschwieler wrote:
>> An observation...  this whole thread is about limits caused by type  
>> safety.  Interestingly, the other implementation of map-reduce does  
>> not support types at all.  Everything is a string.
>> So I agree that our departure from the paper is the problem.  ;-)
> A corollary is that one could simply use BytesWritable for all one's 
> keys and values, altering only one's WritableComparator 
> implementation, and one would not encounter this problem.  The use of 
> types in Hadoop is thus an optional feature.  One could even layer a 
> different type system on top of BytesWritable that exhibits the 
> desired properties.
>> I'm comfortable letting this lie for a while.  But I predict we've  
>> not heard the last of it.
> Owen seems to be picking it up, which is fine by me.
> Doug

View raw message