incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul <rsha...@xebia.com>
Subject min/max/sort APIs not in sync
Date Fri, 14 Sep 2012 13:33:24 GMT
Hi all,

We have min/max/sort APIs in Crunch. The min and max rely on S(user 
type) being comparable while the Sort API relies on the corresponding 
writable type being comparable i. WritableComparable.   To me the min 
and max API are special cases of Sort API and the three should be in 
sync with each other.  If this is not the case then at-least 
theoretically we could have cases where sorting produces results that 
are different from min/max functions. We could adopt the Sort approach 
for all three but there are some issues in that api like if the Writable 
is not comparable then the error will not be that clear,  S could have a 
comparator that is different from the Writable then the results are not 
as expected by user etc. Or maybe we can use comparable S in Sort api, I 
am not sure, but I think we would not be able to use hadoop shuffle and 
sort then.  I do not have complete idea how we could make the three in 
sync. Any thoughts on the same ? But I would like to ask first should we 
even try to to do that ? or  I am just cooking some theory and this has 
no practical use case. There has been some discussion on this in 
CRUNCH-57 <https://issues.apache.org/jira/browse/CRUNCH-57> issue. Let 
me know what you think.

regards,
Rahul



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message