hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: Any Way to Skip Mapping?
Date Mon, 03 Nov 2008 17:57:03 GMT
I need the Reduce to Sort so I can merge the records and output in a sorted 
order.
I do not need to join any data just merge rows together so I do not thank 
the join will be any help.

I am storing the data like <row, value<key,data<data,timestamp>>> with a

sorted map as the value.
and on the merge I need to take all the rows that have the same key and 
merge all the sorted maps together and output one row that has all the data 
for that key
something like what hbase is doing but without the in memory index's

Maybe it will be come an option later down the row to skip the maps and let 
the reduce Shuffle directly from the inputSplits.

Billy




"Owen O'Malley" <omalley@apache.org> wrote in 
message news:86EF314D-8058-48C8-9FA2-B7FB63563202@apache.org...
> If you don't need a sort, which is what it sounds like, Hadoop  supports 
> that by turning off the reduce. That is done by setting the  number of 
> reduces to 0. This typically is much faster than if you need  the sort. It 
> also sounds like you may need/want the library that does  map-side joins.
> http://tinyurl.com/43j5pp
>
> -- Owen
> 



Mime
View raw message