hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-80) binary key
Date Wed, 15 Mar 2006 07:22:56 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-80?page=comments#action_12370477 ] 

Owen O'Malley commented on HADOOP-80:

> 1. Why call setSize(0) in read()? This looks like a no-op. Am I missing something?

Yeah, that is a little subtle and I should have commented it better. Basically, when I do
the second setSize, it may call setCapacity, which will copy the current data. If the current
size is 0, then it won't copy anything. It won't change the user visible behavior, but will
save a useless copy.

> 2. Why bother to use md5 for hashCode()?  That could be expensive.  Why not implement
this like 
> java.util.Arrays.hashCode() and UTF8.hashCode():

Yeah, I considered doing something lighter than md5, but using md5 prevents pathological cases
from doing bad things. We also use md5 a lot around here, so it is a really useful default
for us, but it might make sense to have a lighter hash alternative. However, since in map/reduce
the hash function is only used for partitioning the map output, it seemed better to use a
known good hash function than taking a chance on a fast but sloppy hash function.

> binary key
> ----------
>          Key: HADOOP-80
>          URL: http://issues.apache.org/jira/browse/HADOOP-80
>      Project: Hadoop
>         Type: New Feature
>   Components: io
>     Versions: 0.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.1
>  Attachments: binary-key.patch
> I needed a binary key type, so I extended BytesWritable to be comparable also.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message